Search code examples
shellunicodeiconvgbk

iconv example to convert UTF-16BE Chinese simplified covert to GBK


I am trying to covert UTF-16BE BOM to GBK via iconv command line. can somebody help me create linux command example for this. good morning in chinse simplified as per google translator is 早上好 (HEX fe ff 65 e9 4e 0a 59 7d ) how can I covert this to GBK.

I have tried blew command but failed with error

$ iconv -f UTF-16BE -t GBK goodmorning.txt
iconv: goodmorning.txt:1:0: cannot convert

Solution

  • The problem is the first codepoint - U+FEFF, the Byte Order Mark. With UTF-16 encodings, if you specify the endianness directly (Via UTF-16BE or UTF-16LE), the BOM at the beginning is just another character to iconv, and since it doesn't exist in the GBK encoding, you get an error. If you leave out the endianness from the encoding name (Just UTF-16), iconv uses the BOM (If present) to determine the one being used, and your text will be converted.

    Some examples might make it clearer:

    $ perl -e 'print pack("(H2)*", @ARGV)' fe ff 65 e9 4e 0a 59 7d > chinese.txt
    $ file chinese.txt
    chinese.txt: Unicode text, UTF-16, big-endian text, with no line terminators
    $ iconv -f UTF-16BE -t GBK < chinese.txt > out.txt # Failure
    iconv: illegal input sequence at position 0
    $ iconv -f UTF-16 -t GBK < chinese.txt > out.txt # Success
    $ perl -e 'print pack("(H2)*", @ARGV)' 65 e9 4e 0a 59 7d > chinese.txt # No BOM
    $ iconv -f UTF-16BE -t GBK < chinese.txt > out.txt # Success
    $ iconv -f GBK -t UTF-8 < out.txt; echo # Verify the characters
    早上好