Search code examples
pythonpython-3.xlistpython-internals

Memory Size of list python


I was experimenting something with lists and I came around self-referencing lists. I searched through SO and got some of my basic questions about it answered. But when I tried to get the memory size of self referencing lists of different lengths, I found an interesting pattern

Code to reproduce:

import sys

memory_size = {}

for length in range(50):
    lst = []
    for length_loop in range(length):
        lst.append(lst)
    memory_size[length] = sys.getsizeof(lst)

Value of memory_size:

{0: 64, 1: 96, 2: 96, 3: 96, 4: 96, 5: 128, 6: 128, 7: 128, 8: 128, 9: 192, 10: 192, 11: 192, 12: 192, 13: 192, 14: 192, 15: 192, 16: 192, 17: 264, 18: 264, 19: 264, 20: 264, 21: 264, 22: 264, 23: 264, 24: 264, 25: 264, 26: 344, 27: 344, 28: 344, 29: 344, 30: 344, 31: 344, 32: 344, 33: 344, 34: 344, 35: 344, 36: 432, 37: 432, 38: 432, 39: 432, 40: 432, 41: 432, 42: 432, 43: 432, 44: 432, 45: 432, 46: 432, 47: 528, 48: 528, 49: 528}

On plotting the above data points

enter image description here

Python 3.7.3 (default, Mar 27 2019, 16:54:48)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.5.0 -- An enhanced Interactive Python. Type '?' for help.

Why does the memory size of self-referencing list remain constant for certain range of lengths and increase after certain length?

And also the increase the increase in memory size is different


Solution

  • Lists always display this pattern if they are grown with append.

    A key point to understand, sys.getsizeof doesn't account for the objects referenced in the list, only the size of the list object itself. Now, Python list objects are implemented as array lists underneath the hood, so essentially there is a PyObject header (like, 16 byte overhead or some such), then a primitive array of PyObject pointers (which is why they can be heterogenous, and reference themselves).

    This underlying array is overallocated, and it's re-sized in such a way to guarantee amortized constant-time .append operations. Stated another way, Python list objects have amortized constant-time .append, so doing something like for x in range(N): my_list.append(0) is a linear time operation, because the underlying buffer is not re-allocated on each iteration.

    Look, you see the same pattern with any object, like None:

    In [24]: import sys
        ...:
        ...: memory_size = {}
        ...:
        ...: for length in range(50):
        ...:     lst = []
        ...:     for length_loop in range(length):
        ...:         lst.append(None)
        ...:     memory_size[length] = sys.getsizeof(lst)
        ...:
    
    In [25]: memory_size
    Out[25]:
    {0: 72,
     1: 104,
     2: 104,
     3: 104,
     4: 104,
     5: 136,
     6: 136,
     7: 136,
     8: 136,
     9: 200,
     10: 200,
     11: 200,
     12: 200,
     13: 200,
     14: 200,
     15: 200,
     16: 200,
     17: 272,
     18: 272,
     19: 272,
     20: 272,
     21: 272,
     22: 272,
     23: 272,
     24: 272,
     25: 272,
     26: 352,
     27: 352,
     28: 352,
     29: 352,
     30: 352,
     31: 352,
     32: 352,
     33: 352,
     34: 352,
     35: 352,
     36: 440,
     37: 440,
     38: 440,
     39: 440,
     40: 440,
     41: 440,
     42: 440,
     43: 440,
     44: 440,
     45: 440,
     46: 440,
     47: 536,
     48: 536,
     49: 536}
    

    And just to convince you, here is your self-referening list:

    In [26]: import sys
        ...:
        ...: memory_size = {}
        ...:
        ...: for length in range(50):
        ...:     lst = []
        ...:     for length_loop in range(length):
        ...:         lst.append(lst)
        ...:     memory_size[length] = sys.getsizeof(lst)
        ...:
    
    In [27]: memory_size
    Out[27]:
    {0: 72,
     1: 104,
     2: 104,
     3: 104,
     4: 104,
     5: 136,
     6: 136,
     7: 136,
     8: 136,
     9: 200,
     10: 200,
     11: 200,
     12: 200,
     13: 200,
     14: 200,
     15: 200,
     16: 200,
     17: 272,
     18: 272,
     19: 272,
     20: 272,
     21: 272,
     22: 272,
     23: 272,
     24: 272,
     25: 272,
     26: 352,
     27: 352,
     28: 352,
     29: 352,
     30: 352,
     31: 352,
     32: 352,
     33: 352,
     34: 352,
     35: 352,
     36: 440,
     37: 440,
     38: 440,
     39: 440,
     40: 440,
     41: 440,
     42: 440,
     43: 440,
     44: 440,
     45: 440,
     46: 440,
     47: 536,
     48: 536,
     49: 536}
    

    The discrepencies in the individual size come down to things like Python version and system architecture (on a 32-bit system, pointers are 4 bytes instead of 8, for example, and different Python versions are free to change implementation details like the size of an empty list). Note, the above was run on Python 3.7, if I use another environemnt:

    (base) juanarrivillaga@173-11-109-137-SFBA ~ % python -c "import sys; print(f'{sys.version}\nEmpty List Size: {sys.getsizeof([])}')"
    3.8.1 (default, Jan  8 2020, 16:15:59)
    [Clang 4.0.1 (tags/RELEASE_401/final)]
    Empty List Size: 56