Search code examples
pythondictionarysys

Size of dictionary


I am trying to get the actual size of a dict in the memory. I got a weird results and I am looking forward to your feedback.

a = {}
for i in range(2):
    a[i] = {}
    for j in range(1000):
        a[i][j] = j

sys.getsizeof(a), sys.getsizeof(a[0]), sys.getsizeof(a[1])

the results is 272, 49424, 49424 bytes. I expect the size of a is the sum of a[0] and a[1].

but if tried the following

a = {}
for i in range(2000):
   a[i] = [i,i,i]
sys.getsizeof(a)

size of a = 196880 bytes. The first one has 2000 keys and the second one has 2 keys and each one has dict with 1000 keys.


Solution

  • You need to determine the size of the dict, and the sizes of all its keys and values, recursively (I wish Python had a built-in function to do this). I have used variations of this receipe a number of times:

    import sys
    
    def get_size(obj, seen=None):
        """Recursively finds size of objects"""
        size = sys.getsizeof(obj)
        if seen is None:
            seen = set()
        obj_id = id(obj)
        if obj_id in seen:
            return 0
        # Important mark as seen *before* entering recursion to gracefully handle
        # self-referential objects
        seen.add(obj_id)
        if isinstance(obj, dict):
            size += sum([get_size(v, seen) for v in obj.values()])
            size += sum([get_size(k, seen) for k in obj.keys()])
        elif hasattr(obj, '__dict__'):
            size += get_size(obj.__dict__, seen)
        elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)):
            size += sum([get_size(i, seen) for i in obj])
        return size
    

    I have occasionally had to make versions of this that work for other custom types, Numpy arrays, and the like. Sadly there's no perfect generic solution.