Wikipedia Database Dumps
Comments primarily suggest using official Wikipedia database dumps, Wikidata, DBPedia, and tools like Kiwix for offline access instead of scraping Wikipedia.
Activity Over Time
Top Contributors
Keywords
Sample Comments
So weird to scrape wikipedia when you can just download db dumb from them.
There are downloadable, offline versions of wikipedia.
why not just download wikipedia from wikipedia?https://en.wikipedia.org/wiki/Wikipedia:Database_download
Can't you just download a Wikipedia archive?
All from Wikipedia/Wikidata. Now they're served from my own database.
there are many ways to download wikipedia: https://en.wikipedia.org/wiki/Wikipedia:Database_download
In lieu of scraping Wikpedia, could this project be sped up by downloading the instance of Wikipedia itself? It's not that jumbo of a file size.
I'm not sure what your use case is so maybe this isn't helpful, but Wikipedia has weekly or so database dumps that you can download, as well as static HTML (although that might be more out of date)https://en.wikipedia.org/wiki/Wikipedia:Database_download
I'm sorry if I didn't understand. Wouldn't a json or xml type data structure (where some Wikipedia stuff is already stored) would support this?
You can download Wikipedia data; with only articles, it's around 20gb.