In my application I'm getting the user info from LDAP and sometimes the full username comes in a wrong charset. For example:
ТеÑÑ61 ТеÑÑовиÑ61
It can also be in English or in Russian and displayed correctly. If the username changes it's updated in database. Even if I change the value in the db it wont solve the problem.
I can fix it before saving by doing this
new String(incorrect.getBytes("ISO-8859-1"), "UTF-8");
However, if I will use it for the string including characters in Russian (for ex., "Тест61 Тестович61") I get something like this "????61 ????????61".
Can you please suggest something that can determine the charset of string?
Strings in java, AFAIK, do not retain their original encoding - they are always stored internally in some Unicode form. You want to detect the charset of the original stream/bytes - this is why I think your String.toBytes() call is too late.
Ideally if you could get the input stream you are reading from, you can run it through something like this: http://code.google.com/p/juniversalchardet/
There are plenty of other charset detectors out there as well