I am trying to port Python2 protocol code to Python3. How do I get bytestring formatters for built-in types? In Python2, we could do:
>>> b'%s' % None
'None'
>>> b'%s' % 15
'15'
>>> b'%s' % []
'[]'
The same code in Python3 gives:
>>> b'%s' % None
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: %b requires bytes, or an object that implements __bytes__, not 'NoneType'
>>> b'%s' % 15
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: %b requires bytes, or an object that implements __bytes__, not 'int'
>>> b'%s' % []
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: %b requires bytes, or an object that implements __bytes__, not 'list'
How do I install the standard bytestring formatters for built-in types?
"bytestring formatters for built-in types" aren't a thing in Python 3. Textual data is handled at a higher level by default in Python 3, and that includes object representations.
Instead, you can explicitly encode a string representation of an object into bytes, for example:
>>> ('%s' % None).encode('ascii')
b'None'
All builtin Python types are encodeable as ASCII, though the data they contain might not be, for example:
>>> L = []
>>> ('%s' % L).encode('ascii')
b'[]'
>>> L.append('café ∀ 👍')
>>> ('%s' % L).encode('ascii')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 5: ordinal not in range(128)
In that case, you can specify a different encoding, for example UTF-8:
>>> ('%s' % L).encode('utf-8')
b"['caf\xc3\xa9 \xe2\x88\x80 \xf0\x9f\x91\x8d']"
Or if the result is going to be used in a Python context, you could use the ascii
conversion, %a
:
>>> ('%a' % L).encode('ascii')
b"['caf\\xe9 \\u2200 \\U0001f44d']"
This shows an escape sequence for all non-ASCII characters.