Everything in Python is an object. So the size of an int in Python will be larger than usual.
>>> sys.getsizeof(int())
24
OK, but why does it take 12 more bytes for 2⁶³
compared too 2⁶³ - 1
and not just one?
>>> sys.getsizeof(2**63)
36
>>> sys.getsizeof(2**62)
24
I get that 2⁶³
is a long and 2⁶³-1
an int, but why 12 bytes of difference?
No more intuitive, I tried some other things:
>>> a = 2**63
>>> a -= 2**62
>>> sys.getsizeof(a)
36
a
is still stored as a long even if it could be in an int now. So that's not surprising. But:
>>> a -= (2**63 - 1)
>>> a = 2**63
>>> a -= (2**63 - 1)
>>> a
1L
>>> sys.getsizeof(a)
28
A new size.
>>> a = 2**63
>>> a -= 2**63
>>> a
0L
>>> sys.getsizeof(a)
24
Back to 24 bytes, but still with a long.
Last thing I got:
>>> sys.getsizeof(long())
24
Question:
How does the memory storage work in those scenarios?
Sub-questions:
Why is there a gap of 12 bytes to add what our intuition tells us is just 1 bit?
Why are int()
and long()
24 bytes, but long(1)
is already 28 bytes and int(2⁶²)
?
NB: Python 3.X is working a bit differently, but not more intuitively. Here I focused on Python 2.7; I did not test on prior versions.
why does it get 12 more bytes for 2⁶³ compared too 2⁶³ - 1 and not just one?
On an LP64 system1, a Python 2 int
consists of exactly three pointer-sized pieces:
long int
That's 24 bytes in total. On the other hand, a Python long
consists of:
2**63 requires 64 bits to store, so it fits in three 30-bit digits. Since each digit is 4 bytes wide, the whole Python long
will take 24+3*4 = 36 bytes.
In other words, the difference comes from long
having to separately store the size of the number (8 additional bytes) and from it being slightly less space-efficient about storing the value (12 bytes to store the digits of 2**63). Including the size, the value 2**63 in a long
occupies 20 bytes. Comparing that to the 8 bytes occupied by any value of the simple int
yields the observed 12-byte difference.
It is worth noting that Python 3 only has one integer type, called int
, which is variable-width, and implemented the same way as Python 2 long
.
long int
, presumably for source compatibility with a large body of older code that used char
, short
, and long
as "convenient" aliases for 8, 16, and 32-bit values that happened to work on both 16 and 32-bit systems. To get an actual 64-bit type on x86-64 Windows, one must use __int64
or (on newer compiler versions) long long
or int64_t
. Since Python 2 internally depends on Python int
fitting into a C long in various places, sys.maxint
remains 2**31-1
, even on 64-bit Windows. This quirk is also fixed in Python 3, which has no concept of maxint.