python string python-3.x porting hexdump

Write different hex-values in Python2 and Python3

I'm currently porting a Python2 script to Python3 and have problems with this line:

print('\xfe')

When I run it with Python2 python test.py > test.out, than the file consists of the hex-values FE 0A, like expected.

But when I run it with Python3 python3 test.py > test.out, the file consists of the hex-values C3 BE 0A.

What's going wrong here? How can I receive the desired output FE 0A with Python3.

Solution

The byte-sequence C3 BE is the UTF-8 encoded representation of the character U+00FE.

Python 2 handles strings as a sequence of bytes rather than characters. So '\xfe' is a str object containing one byte.

In Python 3, strings are sequences of (Unicode) characters. So the code '\xfe' is a string containing one character. When you print the string, it must be encoded to bytes. Since your environment chose a default encoding of UTF-8, it was encoded accordingly.

How to solve this depends on your data. Is it bytes or characters? If bytes, then change the code to tell the interpreter: print(b'\xfe'). If it is characters, but you wanted a different encoding then encode the string accordingly: print( '\xfe'.encode('latin1') ).