Google Crawling and Indexing

This cluster discusses how Google and other search engines crawl websites, whether they index all crawled content, and related issues like robots.txt, noindex tags, SEO implications, and challenges for third-party crawlers.

➡️ Stable 0.5x Web Development

3,126

Comments

Years Active

Top Authors

#4100

Topic ID

Activity Over Time

2007

2008

2009

2010

170

2011

246

2012

160

2013

201

2014

113

2015

159

2016

2017

108

2018

159

2019

181

2020

165

2021

175

2022

220

2023

313

2024

229

2025

250

2026

Top Contributors

marginalia_nu (25) franze (22) bhartzer (13) AznHisoka (13) greglindahl (12)

Keywords

intellectualpropertyblawg.com JS robots.txt sitemaps.org PR YC isittoxicformydog.com YouTube google.com i.e google index indexed search crawl site crawler search engines sites engines

Sample Comments

tyingq • Apr 27, 2019 • View on HN

Google doesn't index everything it crawls.

lfender6445 • Apr 21, 2015 • View on HN

i wonder if they do this so their content can still get indexed by google

tocomment • Jun 25, 2013 • View on HN

That sounds really useful! Will google penalize for this though?

walshemj • Nov 19, 2013 • View on HN

Not if you want to be crawled properly by Google and bing it isn't

criddell • Aug 13, 2019 • View on HN

Presumably the site being scraped is getting some value from being included in Google's index. If they aren't, they can always opt out.

jshen • Dec 28, 2010 • View on HN

true, but it makes it hard for those of us working on local search sites to crawl their sites too :/

Linkd • Feb 18, 2016 • View on HN

Perhaps to avoid search engine indexing?

detaro • Jul 18, 2018 • View on HN

Do you have a robots.txt entry that's stopping Google from fetching them? That can counter-intuitively cause Google to index pages.

boyter • Nov 7, 2021 • View on HN

Could be due to the long form content you index. I found those sort of sites tend to have less reluctance on 3rd parties. Also possible you are more talented than I with your crawler writing.

pbhjpbhj • Jun 1, 2018 • View on HN

I've heard that they [sometimes] visit but don't index, is that true?