I have a list of bytes (8-bit bytes, or in C/C++ language they form a wchar_t
type string), they form a Unicode string (byte by byte). How to convert those values into a Python string? I tried a few things, but none could join those 2 bytes into 1 character and build an entire string from it. Thank you.
Converting a sequence of bytes to a Unicode string is done by calling the decode()
method on that str
(in Python 2.x) or bytes
(Python 3.x) object.
If you actually have a list of bytes, then, to get this object, you can use ''.join(bytelist)
or b''.join(bytelist)
.
You need to specify the encoding that was used to encode the original Unicode string.
However, the term "Python string" is a bit ambiguous and also version-dependent. The Python str
type stands for a byte string in Python 2.x and a Unicode string in Python 3.x. So, in Python 2, just doing ''.join(bytelist)
will give you a str
object.
Demo for Python 2:
In [1]: 'тест'
Out[1]: '\xd1\x82\xd0\xb5\xd1\x81\xd1\x82'
In [2]: bytelist = ['\xd1', '\x82', '\xd0', '\xb5', '\xd1', '\x81', '\xd1', '\x82']
In [3]: ''.join(bytelist).decode('utf-8')
Out[3]: u'\u0442\u0435\u0441\u0442'
In [4]: print ''.join(bytelist).decode('utf-8') # encodes to the terminal encoding
тест
In [5]: ''.join(bytelist) == 'тест'
Out[5]: True