Search code examples
javacharacter-encoding

How to convert ISO-2022-CN text to UTF-8 in Java?


I have a Java application that needs to convert a string encoded in ISO-2022-CN to UTF-8. However, when I try to do this using the following code:

new String("Text".getBytes("ISO-2022-CN"), StandardCharsets.UTF_8);

I get a java.lang.UnsupportedOperationException.

After some research, I learned that ISO-2022-CN does not support encoding. However, I still need to convert the ISO-2022-CN string to UTF-8. How can I achieve this in Java?

I tried using the following code:

import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.nio.charset.CharsetEncoder;
import java.nio.charset.CodingErrorAction;
import java.nio.ByteBuffer;
import java.nio.CharBuffer;

public class Main {
  public static void main(String[] args) throws Exception {
    String iso2022cn = "Text";

    // Decode ISO-2022-CN to Unicode
    CharsetDecoder decoder = Charset.forName("ISO-2022-CN").newDecoder();
    decoder.onMalformedInput(CodingErrorAction.REPORT);
    decoder.onUnmappableCharacter(CodingErrorAction.REPORT);
    ByteBuffer iso2022cnBytes = ByteBuffer.wrap(iso2022cn.getBytes("ISO-2022-CN"));
    CharBuffer unicodeChars = decoder.decode(iso2022cnBytes);

    // Encode Unicode to UTF-8
    CharsetEncoder encoder = Charset.forName("UTF-8").newEncoder();
    encoder.onMalformedInput(CodingErrorAction.REPORT);
    encoder.onUnmappableCharacter(CodingErrorAction.REPORT);
    ByteBuffer utf8Bytes = encoder.encode(unicodeChars);

    String utf8String = new String(utf8Bytes.array(), "UTF-8");
    System.out.println(utf8String);
  }
}

But this code also does not work for me.

Can anyone suggest a solution or provide an alternative approach to convert ISO-2022-CN text to UTF-8 in Java?


Solution

  • try this method:

    public static String iso2022cn2utf8(String src) {
        StringBuilder dest = new StringBuilder();
        boolean isConvert = false;
    
        for (int i = 0; i < src.length(); i++) {
            char c = src.charAt(i);
            switch (c) {
                case 0x1b:
                    i += 3;
                    break;
                case 0x0e:
                    ++i;
                    isConvert = true;
                    break;
                case 0x0f:
                    ++i;
                    isConvert = false;
                    break;
                default:
                    dest.append(isConvert ? (char) (c | 0x80) : c);
                    break;
            }
        }
    
        return dest.toString();
    }