As we know that Python3 takes all string characters as Unicode code point.
type('\x0d')
<class 'str'>
type(b'\x0d')
<class 'bytes'>
The ascii of b'\x0d'
is 13
,stored in memory in the form of 0000 0111
,'\x0d'
is stored in the same format of 0000 0111
or not?Are they equally stored in memory?
To dig more to make me more confused:
#My python version
python3 --version
Python 3.9.2
#in python cli
len(b'\x0d')
1
import sys
print(sys.getsizeof(b'\x0d'))
34
b\x0d
is not stored in the form of 00000111
in memory?
print(sys.getsizeof('\x0d'))
50
From using sys.getsizeof
make me understand that:
string
and bytes
are stored with different objects in python3.b\x0d
is stored in the form of 00000111
in memory,it is based on some abstract level,in fact b\x0d
is stored with 34 bytes in pc's memory for cython3?You can look at the memory contents of each object in CPython if you are curious. The size of the object can be queried by sys.getsizeof(obj)
and the memory address happens to be the id(obj)
of the object in the current implementation. The ctypes
module has a string_at
function that takes a memory address and size to read memory:
>>> import sys
>>> import ctypes
>>> x = '\x0d'
>>> ctypes.string_at(id(x), sys.getsizeof(x)).hex()
'02ca9a3b0000000070a427b3fb7f00000100000000000000c879dc5ef7a24b87e40000000000000000000000000000000d00'
>>> x = b'\x0d'
>>> ctypes.string_at(id(x), sys.getsizeof(x)).hex()
'01ca9a3b00000000b0b126b3fb7f00000100000000000000c879dc5ef7a24b870d00'
Above you can see the objects have a different memory image, but in this case, at least, the data in the object is stored in the last bytes 0d 00
and is identical due to CPython using the latin-1
8-bit encoding to store the Unicode string (see PEP 393 for details). CPython adds a null terminator as another implementation detail. The other bytes represent data in the implementation of the PyBytes
and PyUnicode
objects in CPython.