I have a dictionary whose keys come in sets that share the same prefix, like this:
d = { "key1":"valA", "key123":"valB", "key1XY":"valC",
"key2":"valD", "key2-22":"valE" }
Given a query string, I need to look up all the values associated with keys that start with that prefix, e.g. for query="key1"
I need to get ["valA", "valB", "valC"]
My implementation below works but is too slow for a large number of queries since the dictionary d
has about 30,000 keys and most of the keys are more than 20 characters long:
result = [d[s] for s in d.keys() if s.startswith(query)]
Is there a faster/more efficient way to implement this?
You can avoid producing the intermediate list generated by dict.keys()
(in python 2.x):
result = [d[key] for key in d if key.startswith(query)]
But you most likely want to use a trie instead of a dictionary, so you can find all the values associated with a key with a common prefix (a trie is similar to a tree based on prefixes).
Here you can find some different implementation of tries.
A trie for keys "A", "to", "tea", "ted", "ten", "i", "in", and "inn". (source wikipedia)
Let's compare the timings for the different solutions:
# create a dictionary with 30k entries
d = {str(x):str(x) for x in xrange(1, 30001)}
query = '108'
# dict with keys()
%timeit [d[s] for s in d.keys() if s.startswith(query)]
100 loops, best of 3: 8.87 ms per loop
# dict without keys()
%timeit [d[s] for s in d if s.startswith(query)]
100 loops, best of 3: 7.83 ms per loop
# 11.72% improvement
# PyTrie (https://pypi.python.org/pypi/PyTrie/0.2)
import pytrie
pt = pytrie.Trie(d)
%timeit [pt[s] for s in pt.iterkeys(query)]
1000 loops, best of 3: 320 µs per loop
# 96.36% improvement
# datrie (https://pypi.python.org/pypi/datrie/0.7)
import datrie
dt = datrie.Trie('0123456789')
for key, val in d.iteritems():
dt[unicode(key)] = val
%timeit [dt[s] for s in dt.keys(unicode(query))]
10000 loops, best of 3: 162 µs per loop
# 98.17% improvement