Wikipedia Database Dumps

Comments primarily suggest using official Wikipedia database dumps, Wikidata, DBPedia, and tools like Kiwix for offline access instead of scraping Wikipedia.

➡️ Stable 0.5x Databases
1,856
Comments
20
Years Active
5
Top Authors
#7732
Topic ID

Activity Over Time

2007
5
2008
20
2009
38
2010
36
2011
56
2012
82
2013
110
2014
80
2015
68
2016
70
2017
93
2018
66
2019
89
2020
140
2021
225
2022
181
2023
195
2024
111
2025
170
2026
21

Keywords

kiwix.org WordNet WikiData ZIM wikidata.org kaggle.com robots.txt GB Q2567859 americantheatre.org wikipedia download just download dumps database dump offline archive rdbms crawl

Sample Comments

0x457 Aug 25, 2025 View on HN

So weird to scrape wikipedia when you can just download db dumb from them.

stickfigure Aug 25, 2025 View on HN

There are downloadable, offline versions of wikipedia.

Phithagoras Dec 10, 2021 View on HN

why not just download wikipedia from wikipedia?https://en.wikipedia.org/wiki/Wikipedia:Database_download

gruez Apr 15, 2024 View on HN

Can't you just download a Wikipedia archive?

mcjiggerlog Sep 26, 2018 View on HN

All from Wikipedia/Wikidata. Now they're served from my own database.

bllguo Oct 20, 2020 View on HN

there are many ways to download wikipedia: https://en.wikipedia.org/wiki/Wikipedia:Database_download

TheMichaelJohn Jul 12, 2021 View on HN

In lieu of scraping Wikpedia, could this project be sped up by downloading the instance of Wikipedia itself? It's not that jumbo of a file size.

Rebelgecko Mar 25, 2020 View on HN

I'm not sure what your use case is so maybe this isn't helpful, but Wikipedia has weekly or so database dumps that you can download, as well as static HTML (although that might be more out of date)https://en.wikipedia.org/wiki/Wikipedia:Database_download

amrrs Jul 2, 2020 View on HN

I'm sorry if I didn't understand. Wouldn't a json or xml type data structure (where some Wikipedia stuff is already stored) would support this?

jenno Feb 18, 2014 View on HN

You can download Wikipedia data; with only articles, it's around 20gb.