Search code examples
pythonsummaxcounterdefaultdict

Finding the max of sum of the values in the inner nested defaultdict


Given a defaultdict(Counter) as such:

from collections import defaultdict, Counter

from collections import defaultdict, Counter

x = {('a', 'z'): Counter({'crazy': 1, 'lazy': 1}),
     ('b', 'r'): Counter({'brown': 1}),
     ('d', 'o'): Counter({'dog': 1}),
     ('e', 'r'): Counter({'over': 1}),
     ('f', 'o'): Counter({'fox': 1}),
     ('h', 'e'): Counter({'the': 2}),
     ('j', 'u'): Counter({'jumps': 1}),
     ('l', 'a'): Counter({'lazy': 1}),
     ('m', 'p'): Counter({'jumps': 1}),
     ('o', 'g'): Counter({'dog': 1}),
     ('o', 'v'): Counter({'over': 1}),
     ('o', 'w'): Counter({'brown': 1}),
     ('o', 'x'): Counter({'fox': 1}),
     ('p', 's'): Counter({'jumps': 1}),
     ('r', 'o'): Counter({'brown': 1}),
     ('t', 'h'): Counter({'the': 2}),
     ('u', 'm'): Counter({'jumps': 1}),
     ('v', 'e'): Counter({'over': 1}),
     ('w', 'n'): Counter({'brown': 1}),
     ('z', 'y'): Counter({'crazy': 1, 'lazy': 1})}

I can access the values in the tuple key as such:

>>> x[('a', 'z')]
Counter({'crazy': 1, 'lazy': 1})

If I want to find the tuple key with the highest sum of the values in the inner dictionary, i.e. the Counter, I can do this:

>>> max([(sum(x[ng].values()), ng) for ng in x])
(2, ('z', 'y'))
>>> max([(sum(x[ng].values()), ng) for ng in x])[1]
('z', 'y')

I think the steps are a little convoluted to get the max values. Is there a more straightforward way to get the max of sum of the values in the inner nested defaultdict?

Note as much as possible don't recreate another object from x. This sample is small but the actual size of the x object can contain 1,000,000 keys and the inner counters could be in 10,000,000 in size.


Solution

  • You can catch the value in the iteration to avoid looking up the value with [], but it's not that much cleaner:

    max((sum(c.values()), key) for key, c in x.items())
    

    You can provide max with a function which looks nice if you only want to return the key of the max entry:

    max(x, key=lambda k: sum(x[k].values()))