Search code examples
pythonpython-3.xunicodeasciinon-ascii-characters

Python3 handling non-ASCII characters in a weird way


I was trying to solve a pwnable with Python 3. For that I need to print some characters that are not in the ASCII range.

Python 3 is converting these characters into some weird Unicode.

For example if I print "\xff" in Python 3, I get this:

root@kali:~# python3 -c 'print("\xff")' | xxd
00000000: c3bf 0a                                  ...

\xff gets converted to \xc3\xbf

But in Python 2 it works as expected, like this:

root@kali:~# python -c 'print("\xff")' | xxd
00000000: ff0a                                     ..

So how can print it like that in Python 3?


Solution

  • In Python 2, print '\xff' writes a bytes string directly to the terminal, so you get the byte you print.

    In Python 3, print('\xff') encodes the Unicode character U+00FF to the terminal using the default encoding...in your case UTF-8.

    To directly output bytes to the terminal in Python 3 you can't use print, but you can use the following to skip encoding and write a byte string:

    python3 -c "import sys; sys.stdout.buffer.write(b'\xff')"