Web Search Indexing

The cluster centers on debates about whether a new search engine builds its own web crawler and index or relies on third-party sources like Common Crawl, Bing, or Google, emphasizing the high costs, scalability challenges, and feasibility of independent indexing.

📉 Falling 0.5x Startups & Business
2,901
Comments
20
Years Active
5
Top Authors
#5029
Topic ID

Activity Over Time

2007
13
2008
60
2009
84
2010
94
2011
138
2012
94
2013
93
2014
89
2015
72
2016
118
2017
153
2018
145
2019
211
2020
223
2021
244
2022
328
2023
242
2024
216
2025
243
2026
41

Keywords

bbc.co e.g CPU mitta.us nyt.com DuckDuckGo WWW DDG archive.org DeuSu index crawl search crawling indexing google crawler web search engine pages

Sample Comments

rohit89 Aug 25, 2022 View on HN

Wouldn't it be prohibitively expensive for you to crawl and index the web?

is_true Oct 7, 2024 View on HN

maybe they are using commoncrawl, webarchive, yandex as indexes?

throwaway12345t Sep 25, 2025 View on HN

Do they pull their own index like brave or are they using Bing/Google in the background?

throw_nbvc1234 May 5, 2023 View on HN

is it uninteresting or just not scalable for a human to dig through; content vs index/search problem?

mprovost Mar 14, 2013 View on HN

Google can do (1) because they crawl the entire web constantly. That's not going to be a trivial feature to add for anyone else. (2) is also probably a side effect of crawling pretty much everything before you ask for it.

losvedir Sep 14, 2018 View on HN

I can't tell from a quick perusal: does it do its own crawling and manage its own index? Or is it more like DDG, just using Bing, Yandex, etc?

j2kun Jan 17, 2018 View on HN

Indexing the entire web isn't instantaneous, you know.

rkudeshi Jan 10, 2021 View on HN

Do you index webpages yourself or piggyback off Bing/Google?

nwellnhof Feb 25, 2022 View on HN

Google doesn't index everything and they don't even return results from their index with perfect precision. You have to make some trade-offs at that scale.

robrenaud Jul 11, 2013 View on HN

+1 for being super ambitious.Full disclosure: I work at Google (though not on web search).Your search actually sucks, perhaps because your index is woefully inadequate. How many pages are in it? Maybe you should use common crawl?