In Python (3.7 and above) I would like to obtain a reference to a dict key. More precisely, let d
be a dict where the keys are strings. In the following code, the value of k
is potentially stored at two distinct locations in memory (one pointed to by the dict and one pointed to by k
), whereas the value of v
is stored at only one location (the one pointed to by the dict).
# d is a dict
# k is a string dynamically constructed, in particular not from iterating over d's keys
if k in d:
v = d[k]
# Now store k and v in other data structures
In my case, the dict is very large and the string keys are very long. To keep memory usage down I would like to replace k
with a pointer to the corresponding string used by d
before storing k
in other data structures. Is there a straightforward way of doing this, that is using the keys of the dict as a string pool?
(Footnote: this may seem as premature optimisation, and perhaps it is, but being an old-school C programmer I sleep better at night doing "memory tricks". Joke aside, I do genuinely would like to know the answer out of curiosity, and I am indeed going to run my code on a Raspberry Pi and will probably face memory issues.)
Where does the key k
come from? Is it dynamically constructed by something like str.join
, +
, slicing another string, bytes.decode
etc? Is it read from a file or input()
? Did you get it from iterating over d
at some point? Or does it originate from a literal somewhere in your source code?
In the last two cases, you don't need to worry about it since it is going to be a single instance anyway.
If not, you could use sys.intern
to intern your keys. If a == b
then sys.intern(a) is sys.intern(b)
.
Another possible solution, in case you might want to garbage collect the strings at some point or you want to intern some non-string values, like tuples of strings, you could do the following:
# create this dictionary once after `d` has all the right keys
canonical_keys = {key: key for key in d}
k = canonical_keys.get(k, k) # use the same instance if possible
I recommend reading up on Python's data model.