Web Search Indexing
The cluster centers on debates about whether a new search engine builds its own web crawler and index or relies on third-party sources like Common Crawl, Bing, or Google, emphasizing the high costs, scalability challenges, and feasibility of independent indexing.
Activity Over Time
Top Contributors
Keywords
Sample Comments
Wouldn't it be prohibitively expensive for you to crawl and index the web?
maybe they are using commoncrawl, webarchive, yandex as indexes?
Do they pull their own index like brave or are they using Bing/Google in the background?
is it uninteresting or just not scalable for a human to dig through; content vs index/search problem?
Google can do (1) because they crawl the entire web constantly. That's not going to be a trivial feature to add for anyone else. (2) is also probably a side effect of crawling pretty much everything before you ask for it.
I can't tell from a quick perusal: does it do its own crawling and manage its own index? Or is it more like DDG, just using Bing, Yandex, etc?
Indexing the entire web isn't instantaneous, you know.
Do you index webpages yourself or piggyback off Bing/Google?
Google doesn't index everything and they don't even return results from their index with perfect precision. You have to make some trade-offs at that scale.
+1 for being super ambitious.Full disclosure: I work at Google (though not on web search).Your search actually sucks, perhaps because your index is woefully inadequate. How many pages are in it? Maybe you should use common crawl?