Search code examples
pythonpython-2.7cpythonpython-internals

Why is the size of 2⁶³ 36 bytes, but 2⁶³-1 is only 24 bytes?


Everything in Python is an object. So the size of an int in Python will be larger than usual.

>>> sys.getsizeof(int())
24

OK, but why does it take 12 more bytes for 2⁶³ compared too 2⁶³ - 1 and not just one?

>>> sys.getsizeof(2**63)
36
>>> sys.getsizeof(2**62)
24

I get that 2⁶³ is a long and 2⁶³-1 an int, but why 12 bytes of difference?

No more intuitive, I tried some other things:

>>> a = 2**63
>>> a -= 2**62
>>> sys.getsizeof(a)
36

a is still stored as a long even if it could be in an int now. So that's not surprising. But:

>>> a -= (2**63 - 1)
>>> a = 2**63
>>> a -= (2**63 - 1)
>>> a
1L
>>> sys.getsizeof(a)
28

A new size.

>>> a = 2**63
>>> a -= 2**63
>>> a
0L
>>> sys.getsizeof(a)
24

Back to 24 bytes, but still with a long.

Last thing I got:

>>> sys.getsizeof(long())
24

Question:

How does the memory storage work in those scenarios?

Sub-questions:

Why is there a gap of 12 bytes to add what our intuition tells us is just 1 bit?

Why are int() and long() 24 bytes, but long(1) is already 28 bytes and int(2⁶²)?

NB: Python 3.X is working a bit differently, but not more intuitively. Here I focused on Python 2.7; I did not test on prior versions.


Solution

  • why does it get 12 more bytes for 2⁶³ compared too 2⁶³ - 1 and not just one?

    On an LP64 system1, a Python 2 int consists of exactly three pointer-sized pieces:

    • type pointer
    • reference count
    • actual value, a C long int

    That's 24 bytes in total. On the other hand, a Python long consists of:

    • type pointer
    • reference count
    • digit count, a pointer-sized integer
    • inline array of value digits, each holding 30 bits of value, but stored in 32-bit units (one of the unused bits gets used for efficient carry/borrow during addition and subtraction)

    2**63 requires 64 bits to store, so it fits in three 30-bit digits. Since each digit is 4 bytes wide, the whole Python long will take 24+3*4 = 36 bytes.

    In other words, the difference comes from long having to separately store the size of the number (8 additional bytes) and from it being slightly less space-efficient about storing the value (12 bytes to store the digits of 2**63). Including the size, the value 2**63 in a long occupies 20 bytes. Comparing that to the 8 bytes occupied by any value of the simple int yields the observed 12-byte difference.

    It is worth noting that Python 3 only has one integer type, called int, which is variable-width, and implemented the same way as Python 2 long.


    1 64-bit Windows differs in that it retains a 32-bit long int, presumably for source compatibility with a large body of older code that used char, short, and long as "convenient" aliases for 8, 16, and 32-bit values that happened to work on both 16 and 32-bit systems. To get an actual 64-bit type on x86-64 Windows, one must use __int64 or (on newer compiler versions) long long or int64_t. Since Python 2 internally depends on Python int fitting into a C long in various places, sys.maxint remains 2**31-1, even on 64-bit Windows. This quirk is also fixed in Python 3, which has no concept of maxint.