Search code examples
javalinuxcharacter-encodingutf-16smpp

convert charset in windows and linux


I'm building SMPP gateway that gets byte[] array of indian chars and converts it to readable string that forwarded by email. In Win machine, this code is working:

byte[] data= ....;
shortMessage = new String(data, GSMCharset.forName("UTF-16"));

In Linux however, its give rubbish.

I tried other charset options, but all give me nothing. Any ideas how to make it work on Linux.

(The DataCoding == 8)


Solution

  • It seems the encoding of the output is controlled by the encoding of the source file. Unless specified at compile time (How can I specify the encoding of Java source files?), the default encoding is inherited from the OS.

    I am guessing the Windows machine you used had a default encoding that caused the output you are expecting, while the Linux machine did not. See this question for a similar issue reported - Charset of Java source file and failing test.

    I was able to reproduce the behavior. Also found a fix - changing the encoding of the source file. Read on for details.

    I ran the following code in two different encodings.

    System.out.println(Charset.defaultCharset().toString());
    byte[] data = new byte[] {9, 22, 9, 65, 9, 54, 9, 22, 9, 44, 9, 48, 9, 64};
    System.out.println(Arrays.toString(data));
    System.out.println(new String(data, "UTF-16"));
    

    Using default encoding of OS

    In my case, it was "MacRoman" on my mac. The output is this:

    MacRoman
    [9, 22, 9, 65, 9, 54, 9, 22, 9, 44, 9, 48, 9, 64]
    ???????
    

    Using UTF-8 encoding

    I changed the encoding of the source file (see the "Properties" of the source file). Ran again. The output is this:

    UTF-8
    [9, 22, 9, 65, 9, 54, 9, 22, 9, 44, 9, 48, 9, 64]
    खुशखबरी