I am trying to do some research about chinese persons by using wiki data. Other than using dbpedia (as info about chinese person is bit limited comparing to zh.wikipedia.org), I found that I can download directly from zhwiki http://download.wikipedia.com/zhwiki/20150301/.
I see there is an index file, from the file I can see row such as: 966576:291:人物
Which I assume is a lookup key? Can someone tell me how to use this lookup key to search the main file or database?
There are two files
index file has lines
offset is starting offset of bz2 stream. You need to read bytes from offset1 to offset2 from bz2 file and pass them to bz2 decoder and it will give you xml dump of 100 pages from that stream