Search code examples
pythonhexhexdump

python2 vs. python3 in string containing hexadecimal


Consider this, using python 3.4:

% python3
Python 3.4.2
% echo `python3 -c "print('a' * 72 + '\xff\xbe\xbf\xff')"` | hexdump -x
0000000    6161    6161    6161    6161    6161    6161    6161    6161
*
0000040    6161    6161    6161    6161    bfc3    bec2    bfc2    bfc3
0000050    000a                                                        
0000051

and this one, using python 2.7.9:

% python2 --version
Python 2.7.9
% echo `python2 -c "print('a' * 72 + '\xff\xbe\xbf\xff')"` | hexdump -x
0000000    6161    6161    6161    6161    6161    6161    6161    6161
*
0000040    6161    6161    6161    6161    beff    ffbf    000a        
000004d

Is this really a bug from python 3.4 implementation?


Solution

  • Python 2’s plain '-quoted strings represent strings of bytes; Python 3’s represent strings of characters. The equivalents in the opposite language are bytes (b'literal') and unicode (u'literal'), respectively.

    % python3 -c "from sys import stdout; stdout.buffer.write(b'a' * 72 + b'\xff\xbe\xbf\xff\n')" | hexdump -x