Search code examples
pythongbk

gbk encoding with string "赵孟頫"


Here is the code in python activation mode:

>>> s = u'赵孟頫'
>>> s.encode('gbk')
'\xd5\xd4\xc3\xcf\xee\\'

Why does the GBK string has a trailing backslash?


Solution

  • In [8]: '\xd5\xd4\xc3\xcf\xee\\' == '\xd5\xd4\xc3\xcf\xee\x5c'
    Out[8]: True
    

    The trailing backslash is just the byte '\x5c'.

    In [9]: hex(ord('\\'))
    Out[9]: '0x5c'
    
    In [10]: '\x5c'
    Out[10]: '\\'
    

    A string is just a sequence of bytes, and the final byte just happens to be the same as a backslash encoded in ASCII. When Python prints the repr of a string, it converts bytes into printable ASCII characters when possible.