Search code examples
pythondictionarykeyerror

Filter values inside Python generator expressions


I have a dictionary dct for which I want each of its values to be summed provided their corresponding keys exist in a specified list lst.

The code I am using so far is:

sum(dct[k] for k in lst)

In the above generator expression I would like to handle the KeyError in case a key from the list is not found inside the dictionary. I cannot seem to find how to implement (syntax-wise) either a try-except approach, nor an if-else approach inside this generator expression.

In case a key from the list is not found inside the dictionary, then it should carry on getting the other values. The end result of the sums should not be affected by any missing keys. In case none of the keys exist, then zero should be the sum's result.


Solution

  • Well, there are few options, preferred one is to use dict.get():

    # 1
    sum(dct.get(k, 0) for k in lst)
    # 2
    sum(dct[k] for k in lst if k in dct)
    

    Also one of the option is to filter lst before iteraring over it:

    sum(dct[k] for k in filter(lambda i: i in dct, lst))
    

    And you can use reduce function on filtered list as an alternative to sum:

    reduce(lambda a, k: a + dct[k], filter(lambda i: i in dct, lst))
    

    Now let's find fastest approach with timeit:

    from timeit import timeit
    import random
    
    lst = range(0, 10000)
    dct = {x:x for x in lst if random.choice([True, False])}
    
    via_sum = lambda:(sum(dct.get(k, 0) for k in lst))
    print("Via sum and get: %s" % timeit(via_sum, number=10000))
    # Via sum and get: 16.725695848464966
    
    via_sum_and_cond = lambda:(sum(dct[k] for k in lst if k in dct))
    print("Via sum and condition: %s" % timeit(via_sum_and_cond, number=10000))
    # Via sum and condition: 9.4715681076
    
    via_reduce = lambda:(reduce(lambda a, k: a + dct[k], filter(lambda i: i in dct, lst)))
    print("Via reduce: %s" % timeit(via_reduce, number=10000))
    # Via reduce: 19.9522120953
    

    So the fastest option is to sum items via if statement within generator expression

    sum(dct[k] for k in lst if k in dct) # Via sum and condition: 9.4715681076
    

    Good Luck !