Google Crawling and Indexing
This cluster discusses how Google and other search engines crawl websites, whether they index all crawled content, and related issues like robots.txt, noindex tags, SEO implications, and challenges for third-party crawlers.
Activity Over Time
Top Contributors
Keywords
Sample Comments
Google doesn't index everything it crawls.
i wonder if they do this so their content can still get indexed by google
That sounds really useful! Will google penalize for this though?
Not if you want to be crawled properly by Google and bing it isn't
Presumably the site being scraped is getting some value from being included in Google's index. If they aren't, they can always opt out.
true, but it makes it hard for those of us working on local search sites to crawl their sites too :/
Perhaps to avoid search engine indexing?
Do you have a robots.txt entry that's stopping Google from fetching them? That can counter-intuitively cause Google to index pages.
Could be due to the long form content you index. I found those sort of sites tend to have less reluctance on 3rd parties. Also possible you are more talented than I with your crawler writing.
I've heard that they [sometimes] visit but don't index, is that true?