Google Crawling and Indexing

This cluster discusses how Google and other search engines crawl websites, whether they index all crawled content, and related issues like robots.txt, noindex tags, SEO implications, and challenges for third-party crawlers.

➡️ Stable 0.5x Web Development
3,126
Comments
20
Years Active
5
Top Authors
#4100
Topic ID

Activity Over Time

2007
5
2008
77
2009
97
2010
170
2011
246
2012
160
2013
201
2014
113
2015
159
2016
77
2017
108
2018
159
2019
181
2020
165
2021
175
2022
220
2023
313
2024
229
2025
250
2026
21

Keywords

intellectualpropertyblawg.com JS robots.txt sitemaps.org PR YC isittoxicformydog.com YouTube google.com i.e google index indexed search crawl site crawler search engines sites engines

Sample Comments

tyingq Apr 27, 2019 View on HN

Google doesn't index everything it crawls.

lfender6445 Apr 21, 2015 View on HN

i wonder if they do this so their content can still get indexed by google

tocomment Jun 25, 2013 View on HN

That sounds really useful! Will google penalize for this though?

walshemj Nov 19, 2013 View on HN

Not if you want to be crawled properly by Google and bing it isn't

criddell Aug 13, 2019 View on HN

Presumably the site being scraped is getting some value from being included in Google's index. If they aren't, they can always opt out.

jshen Dec 28, 2010 View on HN

true, but it makes it hard for those of us working on local search sites to crawl their sites too :/

Linkd Feb 18, 2016 View on HN

Perhaps to avoid search engine indexing?

detaro Jul 18, 2018 View on HN

Do you have a robots.txt entry that's stopping Google from fetching them? That can counter-intuitively cause Google to index pages.

boyter Nov 7, 2021 View on HN

Could be due to the long form content you index. I found those sort of sites tend to have less reluctance on 3rd parties. Also possible you are more talented than I with your crawler writing.

pbhjpbhj Jun 1, 2018 View on HN

I've heard that they [sometimes] visit but don't index, is that true?