Search code examples
javaencodingutf-8text-decoding

Unable to decode сyrillic text with Java


I have the following text:

Анна Меркулова

With help of the following online decoder https://2cyr.com/decode/?lang=en I was able to decode mentioned string to the correct one:

Анна Меркулова

enter image description here

Source encoding is UTF-8 and the target is WINDOWS-1251

but I still unable to do it programmatically in Java:

String utf8String = new String("Анна Меркулова".getBytes(), "UTF-8");
String ansiString = new String(utf8String.getBytes("UTF-8"), "windows-1251");
System.out.println(ansiString);

returns

Анна Меркулова

What am I doing wrong and how to properly convert the string?


Solution

  • You're trying to assign the String(s) a Charset, but what you really need to do is extract the bytes with a specific Charset

    final byte[] bytes = "Анна Меркулова".getBytes("UTF-8");
    final String utf8String = new String(bytes);
    final byte[] bytes1 = utf8String.getBytes("windows-1251");
    final String ansiString = new String(bytes1);
    

    And by the way, you don't need all of that

    final byte[] bytes = "Анна Меркулова".getBytes("windows-1251");
    final String result = new String(bytes);