Search code examples
python-3.xunicodecpython

Are they equally stored in memory in python3?


As we know that Python3 takes all string characters as Unicode code point.

type('\x0d')
<class 'str'>
type(b'\x0d')
<class 'bytes'>

The ascii of b'\x0d' is 13,stored in memory in the form of 0000 0111,'\x0d' is stored in the same format of 0000 0111 or not?Are they equally stored in memory?
To dig more to make me more confused:

#My python version
python3 --version
Python 3.9.2
#in python cli
len(b'\x0d')
1
import sys
print(sys.getsizeof(b'\x0d'))
34

b\x0d is not stored in the form of 00000111 in memory?

print(sys.getsizeof('\x0d'))
50

From using sys.getsizeof make me understand that:

  1. string and bytes are stored with different objects in python3.
  2. When we say that b\x0d is stored in the form of 00000111 in memory,it is based on some abstract level,in fact b\x0d is stored with 34 bytes in pc's memory for cython3?

Solution

  • You can look at the memory contents of each object in CPython if you are curious. The size of the object can be queried by sys.getsizeof(obj) and the memory address happens to be the id(obj) of the object in the current implementation. The ctypes module has a string_at function that takes a memory address and size to read memory:

    >>> import sys
    >>> import ctypes
    >>> x = '\x0d'
    >>> ctypes.string_at(id(x), sys.getsizeof(x)).hex()
    '02ca9a3b0000000070a427b3fb7f00000100000000000000c879dc5ef7a24b87e40000000000000000000000000000000d00'
    >>> x = b'\x0d'
    >>> ctypes.string_at(id(x), sys.getsizeof(x)).hex()
    '01ca9a3b00000000b0b126b3fb7f00000100000000000000c879dc5ef7a24b870d00'
    

    Above you can see the objects have a different memory image, but in this case, at least, the data in the object is stored in the last bytes 0d 00 and is identical due to CPython using the latin-1 8-bit encoding to store the Unicode string (see PEP 393 for details). CPython adds a null terminator as another implementation detail. The other bytes represent data in the implementation of the PyBytes and PyUnicode objects in CPython.