I am sure the solution is easy, but I haven't found it online. I might be just not searching as I should.
I want to be able to easily change between a character and its bits and hex representation in python. I usually do it as follows:
If I have a character, say chr(97), the letter "a", I can get its bit representation by
byte_character=chr(97).encode()
Or it's hex rep by
hex_character=chr(97).encode().hex()
and to going back I can use
bytes.fromhex(hex_character).decode()
byte_character.decode()
This works fine for most characters but for some of them the encoding uses more than one character. An example is chr(140)
that when encoded gives 2 bytes:
chr(140).encode()
gives
b'\xc2\x8c'
rather than just b'\x8c'
as I expect. Can you explain me what I am doing wrong?
If all you need is the 0 .. 255 byte range, you can use the latin1
(ISO-8859-1) encoding:
>>> chr(140).encode('latin1')
b'\x8c'
>>> chr(255).encode('latin1')
b'\xff'
>>> chr(256).encode('latin1')
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0100' in position 0: ordinal not in range(256)
Your original attempt was using the UTF-8 encoding by default, which emits multiple bytes for code points above 127:
>>> chr(140).encode()
b'\xc2\x8c'
>>> chr(127).encode()
b'\7f'
>>> chr(128).encode()
b'\xc2\x80'
>>> chr(12345).encode()
b'\xe3\x80\xb9'