When creating bytes with "b" prefix before string, what encoding does python use?

From the python doc:

Bytes literals are always prefixed with 'b' or 'B'; they produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes.

I know that I can create a bytes object with b prefix expression like: b'cool', this will convert a unicode string 'cool' into bytes. I'm aslo aware that bytes instance could be created by bytes() function but you need to specify the encoding argument: bytes('cool', 'utf-8').

From my understaing, I need to use one of the encoding rules if I want to tranlate a string into a sequence of bytes . I have done some experiments and it seems b prefix converts string into bytes using utf-8 encoding:

>>> a = bytes('a', 'utf-8')
>>> b'a' == a
True
>>> b = bytes('a', 'utf-16')
>>> b'a' == b
False

My question is when creating a bytes object through b prefix, what encoding does python use? Is there any doc that specifies this question? Does it use utf-8 or ascii as default?

Solution

The bytes type can hold arbitrary data. For example, (the beginning of) a JPEG image:

>>> with open('Bilder/19/01/IMG_3388.JPG', 'rb') as f:
...     head = f.read(10)

You should think of it as a sequence of integers. That's also how the type behaves in many aspects:

>>> list(head)
[255, 216, 255, 225, 111, 254, 69, 120, 105, 102]
>>> head[0]
255
>>> sum(head)
1712

For reasons of convenience (and for historical reasons, I guess), the standard representation of the bytes, and its literals, are similar to strings:

>>> head
b'\xff\xd8\xff\xe1o\xfeExif'

It uses ASCII printable characters where applicable, \xNN escapes otherwise. This is convenient if the bytes object represents text:

>>> 'Zoë'.encode('utf8')
b'Zo\xc3\xab'
>>> 'Zoë'.encode('utf16')
b'\xff\xfeZ\x00o\x00\xeb\x00'
>>> 'Zoë'.encode('latin1')
b'Zo\xeb'

When you type bytes literals, Python uses ASCII to decode them. Characters in the ASCII range are encoded the same way in UTF-8, that's why you observed the equivalence of b'a' == bytes('a', 'utf8'). A bit less misleading might be the expression b'a' == bytes('a', 'ascii').