Search code examples

"EOFError: Ran out of input" while use Wikipedia Extractor as a parser for Wikipedia Data Dump File

I've tried to convert bz2 to text with "Wikipedia Extractor( I've downloaded wikipedia dump with bz2 extension then on command line used this line of code:

python -b 85M -o extracted D:\wikiextractor-master\wikiextractor\zhwiki-latest-pages-articles.xml.bz2

After finishing preprocessing the pages, I came out with error like this: enter image description here

How can I fix this?


  • I encountered this problem. Likely caused by the StringIO issue with Windows. I re-run it on Windows Subsystem for Linux (WSL) and it went well.