Search code examples
androidcharacter-encodingcyrillic

How to detect Windows-1251 encoded characters


Is there a proper way to detect the Windows-1251 encoded characters ?

IMO, unlike multiple-byte native characters, Windows-1251 is an 8-bit character encoding, so it's impossible to distinguish it from other 8-bit native characters like latin1. If I am wrong on this, please correct me.

The first clue to me is locale, I take all the non-ascii characters as Windows-1251 if the locale is ru.

Are there any better ways ?

UPDATE:

Here is the context of my question, there are some Windows-1251 encoded characters in the ID3 info of a MP3 files, I have to detect the Windows-1251 encoded characters and then convert them to UTF-16 using icu4c , otherwise those Windows-1251 encoded characters will represented unreadable on my system(Android). I deem maybe some of you have better ways.


Solution

  • There is no reliable way to detect, when given as input an array of 8 bit characters, which 8 bit encoding has been used for those characters.