I'm not sure about how to deal with this encoding. I got the 5k most common russian words in this file with data that looks like this:
1 36358.94 Ë misc
2 27792.36 ‚ prep
3 20689.51 ÌÂ misc
4 18942.62 ÓÌ pron
5 16588.14 ̇ prep
6 15631.11 ˇ pron
7 12546.08 ˜ÚÓ misc...
I know that the 3rd word in each line is the cyrillic, however, I don't know how to turn those characters into the Cyrillic alphabet. If anyone could help, that would be great.
You can try windows-1251 encoding, it should help.