Search code examples
filedownloadwikiwikipedia

Download wiki in one or more files


I would like to load data from wikipedia for some task in Hadoop. I found some links: http://www.kiwix.org/wiki/Main_Page#Wikipedia_files, https://archive.org/details/enwiki-20160113. But I am not sure in which format it will be and how to work with that. So, question is does anybody know if it is possible to download wikipedia in one or more txt files?


Solution

  • Well, you can download the most recent complete (another dump is in progress at 20161101) dumps of wikipedia content here: https://dumps.wikimedia.org/enwiki/20161020/ Note I don't think this includes media files themselves, and that this example is only the English site - the other sites are available there too.