Search code examples
pythonarrayslistdata-structurespython-internals

Python Array Memory Footprint versus List


I am flipping through the book Fluent Python. It states that for a sequence of all numbers, array is more efficient and faster than a List. From what I gather from this, it also has less of a memory overhead. It states "A Python array is as lean as a C array."

I am curious as to why an array here would show as having more memory than an list.

import array
from random import random
import sys

floats = array.array('d', (random() for i in range(10**7)))
L = [random() for i in range(10**7)]
print(sys.getsizeof(floats))
print(sys.getsizeof(L))

output

81940352
81528056

Solution

  • You just picked the wrong example. The point of using array is when you need to store items whose native representation is smaller than that of a Python object reference. (Which seems to be 8 bytes here.) E.g. if you do:

    from array import array
    from os import urandom
    a = array('B', urandom(1024))
    l = list(a)
    sys.getsizeof(a) # => 1155
    sys.getsizeof(l) # => 9328
    

    Since doubles are also 8 bytes wide there really isn't a more compact way to store them than a different 8 bytes.


    As for the rest of the claims in the book take them with a grain of salt - you can't run Python code - that is, have operations be executed by the Python interpreter - and be as fast as C. You're still incurring overhead when writing Python objects to or reading them from the array, what would be faster is doing some sort of big operation over the entire array in a native function.