Search code examples
pythonpython-3.xdictionarypool

Get reference to Python dict key


In Python (3.7 and above) I would like to obtain a reference to a dict key. More precisely, let d be a dict where the keys are strings. In the following code, the value of k is potentially stored at two distinct locations in memory (one pointed to by the dict and one pointed to by k), whereas the value of v is stored at only one location (the one pointed to by the dict).

# d is a dict
# k is a string dynamically constructed, in particular not from iterating over d's keys
if k in d:
    v = d[k]
    # Now store k and v in other data structures

In my case, the dict is very large and the string keys are very long. To keep memory usage down I would like to replace k with a pointer to the corresponding string used by d before storing k in other data structures. Is there a straightforward way of doing this, that is using the keys of the dict as a string pool?

(Footnote: this may seem as premature optimisation, and perhaps it is, but being an old-school C programmer I sleep better at night doing "memory tricks". Joke aside, I do genuinely would like to know the answer out of curiosity, and I am indeed going to run my code on a Raspberry Pi and will probably face memory issues.)


Solution

  • Where does the key k come from? Is it dynamically constructed by something like str.join, + , slicing another string, bytes.decode etc? Is it read from a file or input()? Did you get it from iterating over d at some point? Or does it originate from a literal somewhere in your source code?

    In the last two cases, you don't need to worry about it since it is going to be a single instance anyway.

    If not, you could use sys.intern to intern your keys. If a == b then sys.intern(a) is sys.intern(b).

    Another possible solution, in case you might want to garbage collect the strings at some point or you want to intern some non-string values, like tuples of strings, you could do the following:

    # create this dictionary once after `d` has all the right keys
    canonical_keys = {key: key for key in d}
    
    k = canonical_keys.get(k, k) # use the same instance if possible
    

    I recommend reading up on Python's data model.