I have a large number of identical dictionaries (identically structured: same keys, different values), which leads to two different memory problems:
dictionaries are expanded exponentially, so each dictionary could be using up to twice the memory it needs to.
dictionaries need to record their labels, so each dictionary is storing the keys for that dictionary, which is a significant amount of memory.
What is a good way that I can share the labels (so each label is not stored in the object), and compress the memory?
It may be offer the following solution to the problem based on the recordclass library:
pip install recordclass
>>> from recordclass import make_dataclass
For given set of labels you create a class:
>>> DataCls = make_dataclass('DataCls', 'first second third')
>>> data = DataCls(first="red", second="green", third="blue")
>>> print(data)
DataCls(first="red", second="green", third="blue")
>>> print('Memory size:', sys.getsizeof(data), 'bytes')
Memory size: 40 bytes
It fast and takes minimum memory. Suitable for creating millions of instances.
The downside: it's C-extension and not in standard library. But available on pypi.
Addition: Starting recordclass
0.15 version there is an option fast_new
for faster instance creation:
>>> DataCls = make_dataclass('DataCls', 'first second third', fast_new=True)
If one don't need keyword arguments then instance creation will be accelerated twice. Starting 0.22 this is default behavior and option fast_new=True
can be omitted.
P.S.: the author of the recordclass library is here.