Search code examples
python-3.xmemorydata-structures

memory efficient data structures in python


I have a large number of identical dictionaries (identically structured: same keys, different values), which leads to two different memory problems:

  • dictionaries are expanded exponentially, so each dictionary could be using up to twice the memory it needs to.

  • dictionaries need to record their labels, so each dictionary is storing the keys for that dictionary, which is a significant amount of memory.

What is a good way that I can share the labels (so each label is not stored in the object), and compress the memory?


Solution

  • It may be offer the following solution to the problem based on the recordclass library:

    pip install recordclass
    
    >>> from recordclass import make_dataclass
    

    For given set of labels you create a class:

    >>> DataCls = make_dataclass('DataCls', 'first second third')
    >>> data = DataCls(first="red", second="green", third="blue")
    >>> print(data)
    DataCls(first="red", second="green", third="blue")
    >>> print('Memory size:', sys.getsizeof(data), 'bytes')
    Memory size: 40 bytes
    

    It fast and takes minimum memory. Suitable for creating millions of instances.

    The downside: it's C-extension and not in standard library. But available on pypi.

    Addition: Starting recordclass 0.15 version there is an option fast_new for faster instance creation:

    >>> DataCls = make_dataclass('DataCls', 'first second third', fast_new=True)
    

    If one don't need keyword arguments then instance creation will be accelerated twice. Starting 0.22 this is default behavior and option fast_new=Truecan be omitted.

    P.S.: the author of the recordclass library is here.