Search code examples
pythondictionary-comprehension

Does Python optimize key-value pairs with None values?


Consider a dict with an int key and dict value, where the value features string keys and further dict values.

Now consider these 2 representations of this dict:

>>> a_it_1
{ 1: {"it": None, "ndar": {1:1}, 2: {"it": {2:8}, "ndar": None} }
>>> a_it_2
{ 1: { "ndar": {1:1}, 2: { "it": {2:8}} }

As you can see the inner dict is sparse; not all values appear in all keys.

a_it_2 is excluding key-value pairs that include None values in an attempt to save memory.

This is a simple example, the actual dict contains more than 10,000 outer key-value pairs.

I am doing some debugging and can see that both these use the same amount of memory.

I am using:

round(asizeof.asizeof({...}) / (1024 * 1024), 2)

where the ... represents the dict comprehension I am using to build the 2 dicts displayed above.

The asizeof method is from the asizeof module of Pympler.

I am including my dict comprehensions below.

Retaining the inner sparse key-value pairs but setting None values:

a_it_1 = {k: {p: vi if type(vi) == int or type(vi) == bool or len(vi) > 0 else None for p, vi in v.items() if vi is not None} for k, v in a_it.items() if k < 10000}

Removing the inner sparse key-value pairs completely:

a_it_2 = {k: {p: vi for p, vi in v.items() if vi is not None and (type(vi) == int or type(vi) == bool or len(vi) > 0)} for k, v in a_it.items() if k < 10000}

Taking the size of both these dicts using asizeof returns the same value: 12.3 (megabytes) each.

I would have expected the lack of the key-value pairs in the first place, to use less memory.

Does Python internally optimize key-value pairs with None values?

Updated

My question may not have been clear. And the question I posed was perhaps the wrong one.

My current a_it is using a lot of memory. I would like to optimize it.

I created 2 new versions of the dict based on the comprehensions above.

One includes the key-value pairs with None values; and the other ignores the key-value pairs completely.

I would like to know why they still use the same amount of memory. I accept that Python does not optimise key-value pairs with None values, because it can't.

But is there benefit to excluding the key value pairs for non-yet set values? My experiment did not prove this and I turned to the SO community for help.


Solution

  • No, it does not, and it can't. Consider that there is a difference between

    • the key is in the dictionary, and it has no value ({"foo": None})
    • the key is not in the dictionary, so the question about the value makes no sense ({})

    That is, simply put, "foo" in mydict will be True for {"foo": None}, and one needs to store that information.

    If you're thinking about "sparse" dictionaries (which really isn't the right term here), you may want to look at "slotted dataclasses"; a dataclass using __slots__ is usually much more memory efficient with respect to the key-set, as it has a known universe of possible keys.

    When thinking about the actual memory requirements of two similar sized dict with similar content, it's important to keep in mind that

    • dict overcommits the key-space, in order to avoid having to rebalance the dict during frequent inserts. This means that two similar sized dictionaries (for some fuzzy "similar") will use the same amount of memory, erring on the upside.
    • Strings in CPython are sometimes interned, if possible. This is true to all string literals that are present in the code. Since all interned strings use the same memory backing, the memory required by a second dict which can share a lot of the keys with the first dict only grows due to the str-objects themselves, which is very, very roughly pointer-sized.