Web Search Indexing

The cluster centers on debates about whether a new search engine builds its own web crawler and index or relies on third-party sources like Common Crawl, Bing, or Google, emphasizing the high costs, scalability challenges, and feasibility of independent indexing.

📉 Falling 0.5x Startups & Business

2,901

Comments

Years Active

Top Authors

#5029

Topic ID

Activity Over Time

2007

2008

2009

2010

2011

138

2012

2013

2014

2015

2016

118

2017

153

2018

145

2019

211

2020

223

2021

244

2022

328

2023

242

2024

216

2025

243

2026

Top Contributors

marginalia_nu (71) ChuckMcM (29) boyter (26) greglindahl (18) ColinHayhurst (13)

Keywords

bbc.co e.g CPU mitta.us nyt.com DuckDuckGo WWW DDG archive.org DeuSu index crawl search crawling indexing google crawler web search engine pages

Sample Comments

rohit89 • Aug 25, 2022 • View on HN

Wouldn't it be prohibitively expensive for you to crawl and index the web?

is_true • Oct 7, 2024 • View on HN

maybe they are using commoncrawl, webarchive, yandex as indexes?

throwaway12345t • Sep 25, 2025 • View on HN

Do they pull their own index like brave or are they using Bing/Google in the background?

throw_nbvc1234 • May 5, 2023 • View on HN

is it uninteresting or just not scalable for a human to dig through; content vs index/search problem?

mprovost • Mar 14, 2013 • View on HN

Google can do (1) because they crawl the entire web constantly. That's not going to be a trivial feature to add for anyone else. (2) is also probably a side effect of crawling pretty much everything before you ask for it.

losvedir • Sep 14, 2018 • View on HN

I can't tell from a quick perusal: does it do its own crawling and manage its own index? Or is it more like DDG, just using Bing, Yandex, etc?

j2kun • Jan 17, 2018 • View on HN

Indexing the entire web isn't instantaneous, you know.

rkudeshi • Jan 10, 2021 • View on HN

Do you index webpages yourself or piggyback off Bing/Google?

nwellnhof • Feb 25, 2022 • View on HN

Google doesn't index everything and they don't even return results from their index with perfect precision. You have to make some trade-offs at that scale.

robrenaud • Jul 11, 2013 • View on HN

+1 for being super ambitious.Full disclosure: I work at Google (though not on web search).Your search actually sucks, perhaps because your index is woefully inadequate. How many pages are in it? Maybe you should use common crawl?