Search code examples
pythonpython-3.xternarydictionary-comprehension

Incrementing dict values with dict comprehension


I'm trying to do the following expression with dict comprehension and a ternary operation in python:

for num in ar:
     if num in seen_dict:
         seen_dict[num] += 1
     else:
         seen_dict[num] = 1

I tried this:

seen_dict = { num: seen_dict[num] += 1 for num in ar if num in seen_dict else seen_dict[num] = 1}

and several permutations thereof, but I keep getting syntax errors. Is it possible to do what I want?

UPDATE

This is the correct syntax, but not my dict only comes back with 1's: seen_dict = { num: (seen_dict[num] + 1) if num in seen_dict else 1 for num in ar }

Can someone explain why this doesn't function the same way as the for loop? Thank you.


Solution

  • Don't. It seems like using dict comprehensions for this should be a good idea, but it's actually a horrible trap. Use collections.Counter:

    import counts
    
    seen_dict = collections.Counter(ar)
    

    or if you don't want to do that, then stick with the loop.

    The problem with trying to use a dict comprehension is that a dict comprehension has no good way to maintain state or interleave the computation of the values of each key. Each value must be computed in a single expression. In contrast, the best way to solve your counting problem is to make a single pass over ar and update each element's count as you go.

    The restrictions of a comprehension lead to horribly inefficient attempts like

    seen_dict = {val: ar.count(val) for val in ar}
    

    which makes a number of passes over ar equal to the length of ar, or the slightly more efficient but still horribly suboptimal

    seen_dict = {val: ar.count(val) for val in set(ar)}
    

    which only needs to make len(set(ar)) passes, or for people a bit more familiar with the standard library,

    from itertools import groupby
    seen_dict = {val: sum(1 for _ in group) for val, group in groupby(sorted(ar))}
    

    which at least isn't quadratic time, but is still O(nlogn) for a length-n ar.

    If we run a timing of these four snippets with input list(range(10000)):

    from collections import Counter
    from itertools import groupby
    from timeit import timeit
    
    ar = list(range(10000))
    
    print(timeit('Counter(ar)', number=1, globals=globals()))
    print(timeit('{val: ar.count(val) for val in ar}', number=1, globals=globals()))
    print(timeit('{val: ar.count(val) for val in set(ar)}', number=1, globals=globals()))
    print(timeit('{val: sum(1 for _ in group) for val, group in groupby(sorted(ar))}',
                 number=1, globals=globals()))
    

    We get the following output:

    0.0005530156195163727
    1.0503493696451187
    1.0463058911263943
    0.00422721728682518
    

    Counter finishes in half a millisecond, while the count snippets both take over a second. (The set version seems to have a lower runtime due to some sort of first-run effect slowing down the other version; swapping the order of the set and non-set version usually reverses the relative timing of those versions. The deduplication of set doesn't help in this test, since the input has no duplicates.)

    For a longer input, relying on count would be even more prohibitively expensive. Relying on count could easily take days for an input that Counter would still finish in under a second.