Is there a proper way to detect the Windows-1251
encoded characters ?
IMO, unlike multiple-byte native characters, Windows-1251
is an 8-bit character encoding, so it's impossible to distinguish it from other 8-bit native characters like latin1
. If I am wrong on this, please correct me.
The first clue to me is locale
, I take all the non-ascii
characters as Windows-1251
if the locale is ru
.
Are there any better ways ?
UPDATE:
Here is the context of my question, there are some Windows-1251
encoded characters in the ID3
info of a MP3 files, I have to detect the Windows-1251
encoded characters and then convert them to UTF-16 using icu4c
, otherwise those Windows-1251
encoded characters will represented unreadable on my system(Android
). I deem maybe some of you have better ways.
There is no reliable way to detect, when given as input an array of 8 bit characters, which 8 bit encoding has been used for those characters.