robots.txt Compliance

The cluster focuses on discussions about whether web crawlers, especially Google, respect and honor robots.txt files, their legal enforceability, and alternatives for controlling site scraping.

➡️ Stable 1.1x Web Development
3,710
Comments
20
Years Active
5
Top Authors
#3029
Topic ID

Activity Over Time

2007
6
2008
26
2009
63
2010
128
2011
223
2012
159
2013
154
2014
161
2015
139
2016
100
2017
208
2018
118
2019
206
2020
155
2021
152
2022
209
2023
435
2024
392
2025
637
2026
39

Keywords

AI MSFT yusuf.fyi robots.txt HTML robotstxt.org what.html ycombinator.com robot.txt quora.com robots txt txt robots disallow user agent file google scrape agent files

Sample Comments

nittanymount Mar 2, 2024 View on HN

curious, will the robots.txt be really honored? maybe legal issue if not?

nubinetwork Jul 7, 2024 View on HN

Most robots don't honour robots.txt anyways...

raincole Sep 6, 2023 View on HN

Does Google have a track record of not respecting robots.txt? Otherwise why is it a problem?

smilliken Nov 29, 2014 View on HN

In practice robots.txt is just a suggestion; even Google doesn't fully respect it.

nubinetwork Dec 26, 2023 View on HN

Just because you put something in robots.txt doesn't mean they will adhere to it.

mlepath Dec 30, 2024 View on HN

Naive question, do people no longer respect robots.txt?

LoSboccacc Jul 14, 2016 View on HN

you know about robots.txt right?

Tloewald May 6, 2015 View on HN

You'd prefer they didn't honor robots.txt at all?

quesera Dec 10, 2012 View on HN

Respecting robots.txt is probably the best plan.

Breza Sep 14, 2019 View on HN

Google respects robots.txt files from webmasters who don't want Google to scrape their content