Search code examples
pythondictionarydictionary-comprehension

Tuple-key dictionary in python: Accessing a whole block of entries


I am looking for an efficient python method to utilise a hash table that has two keys: E.g.:

(1,5) --> {a}
(2,3) --> {b,c}
(2,4) --> {d}

Further I need to be able to retrieve whole blocks of entries, for example all entries that have "2" at the 0-th position (here: (2,3) as well as (2,4)). In another post it was suggested to use list comprehension, i.e.:

sum(val for key, val in dict.items() if key[0] == 'B')

I learned that dictionaries are (probably?) the most efficient way to retrieve a value from an object of key:value-pairs. However, calling only an incomplete tuple-key is a bit different than querying the whole key where I either get a value or nothing. I want to ask if python can still return the values in a time proportional to the number of key:value-pairs that match? Or alternatively, is the tuple-dictionary (plus list comprehension) better than using pandas.df.groupby() (but that would occupy a bit much memory space)?


Solution

  • The "standard" way would be something like

    d = {(randint(1,10),i):"something" for i,x in enumerate(range(200))}
    
    def byfilter(n,d):
        return list(filter(lambda x:x==n, d.keys()))
    
    byfilter(5,d) ##returns a list of tuples where x[0] == 5
    

    Although in similar situations I often used next() to iterate manually, when I didn't need the full list.

    However there may be some use cases where we can optimize that. Suppose you need to do a couple or more accesses by key first element, and you know the dict keys are not changing meanwhile. Then you can extract the keys in a list and sort it, and make use of some itertools functions, namely dropwhile() and takewhile():

    ls = [x for x in d.keys()]
    ls.sort() ##I do not know why but this seems faster than ls=sorted(d.keys())
    
    def bysorted(n,ls):
        return list(takewhile(lambda x: x[0]==n, dropwhile(lambda x: x[0]!=n, ls)))
    bysorted(5,ls) ##returns the same list as above
    

    This can be up to 10x faster in the best case (i=1 in my example) and more or less take the same time in the worst case (i=10) because we are trimming the number of iterations needed.

    Of course you can do the same for accessing keys by x[1], you just need to add a key parameter to the sort() call