Search code examples
pythonencodingutf-16

Python: Convert Integer to UTF16-LE


I got an integer value of 29.827 and I want to convert this into the Unicode Han Character 'glass' (U+7483) (see http://www.fileformat.info/info/unicode/char/7483/index.htm) with UTF-16-LE encoding.

I managed to convert this number into a 3-byte UTF-8 encoding (integers over 2048 have 3byte in UTF-8..) with

s ='\u%s'%hex(int_to_encode)[2:]
file.write(s.decode('unicode-escape').encode('utf-8'))
file.close()

But I figured out the needed encoding is UTF-16-LE. In the intended encoding, an integer representation also has 3 bytes(this is why I thought my first try was correct, also 3 bytes for one integer...)

Thanks a lot for your Help,

Kind regards


Solution

  • First of all to convert a number to a character use chr() (Python3), or unichr() (Python2). Then to encode using UTF-16-LE you simply specify that encoding rather than specifying UTF-8.

    So Python 2:

    int_to_encode = 0x7483
    s = unichr(int_to_encode)
    file.write(s.encode('utf-16-le'))
    file.close()
    

    In either Python 2 or Python 3 you can specify the file encoding when you open it:

    import io
    s = unichr(0x7483)
    with io.open('foo', 'w', encoding='utf-16-le') as f:
        f.write(s)
    

    Console session to show this:

    >>> with io.open('foo', 'w', encoding='utf-16-le') as f:
    ...     f.write(unichr(0x7483))
    ...
    1L
    >>> with io.open('foo', 'r', encoding='utf-16-le') as f:
    ...     print(f.read())
    ...
    璃