Search code examples
pythonhex

Going between hex bytes and strings


I am sure the solution is easy, but I haven't found it online. I might be just not searching as I should.

I want to be able to easily change between a character and its bits and hex representation in python. I usually do it as follows:

If I have a character, say chr(97), the letter "a", I can get its bit representation by

byte_character=chr(97).encode()

Or it's hex rep by

hex_character=chr(97).encode().hex()

and to going back I can use

bytes.fromhex(hex_character).decode()

byte_character.decode()

This works fine for most characters but for some of them the encoding uses more than one character. An example is chr(140) that when encoded gives 2 bytes:

chr(140).encode()

gives

b'\xc2\x8c'

rather than just b'\x8c' as I expect. Can you explain me what I am doing wrong?


Solution

  • If all you need is the 0 .. 255 byte range, you can use the latin1 (ISO-8859-1) encoding:

    >>> chr(140).encode('latin1')
    b'\x8c'
    >>> chr(255).encode('latin1')
    b'\xff'
    >>> chr(256).encode('latin1')
    UnicodeEncodeError: 'latin-1' codec can't encode character '\u0100' in position 0: ordinal not in range(256)
    

    Your original attempt was using the UTF-8 encoding by default, which emits multiple bytes for code points above 127:

    >>> chr(140).encode()
    b'\xc2\x8c'
    >>> chr(127).encode()
    b'\7f'
    >>> chr(128).encode()
    b'\xc2\x80'
    >>> chr(12345).encode()
    b'\xe3\x80\xb9'