Search code examples
pythondictionarycollectionsmultiple-inheritancetoolz

Python Counter with defaultdict(int) behaviour


Consider the following piece of code:

from collections import Counter
from cytoolz import merge_with

my_list = ["a", "b", "a", "a", "c", "d", "b"]
my_dict = {"a" : "blue", "b" : "green", "c" : "yellow", "d" : "red", "e" : "black"}

pair_dict = merge_with(tuple, my_dict, Counter(my_list))

I obtain the following pair_dict:

{'a': ('blue', 3),
 'b': ('green', 2),
 'c': ('yellow', 1),
 'd': ('red', 1),
 'e': ('black',)}

In my real case application I need the values in my pair_dict to be pairs, so pair_dict["e"] should be ('black', 0).

It would be very convenient if I could have a class that extends Counter with the nice behaviour of a defaultdict(int).

Is this easily done?

I naïvely tried the following:

class DefaultCounter(defaultdict, Counter):
    pass

pair_dict = merge_with(tuple, my_dict, DefaultCounter(my_list))

But I get TypeError: first argument must be callable or None. I guess this is due to the fact that defaultdict expects a factory function.

So I tried the following:

pair_dict = merge_with(tuple, my_dict, DefaultCounter(int, my_list))

This results in ValueError: dictionary update sequence element #0 has length 1; 2 is required.

I also tried class DefaultCounter(Counter, defaultdict) but this does not have the desired effect: pair_dict["e"] is still ('black',).

Probably something else should be done in the definition of the class.

So I tried to adapt this answer:

class DefaultCounter(Counter):
    def __missing__(self, key):
        self[key] = 0
        return 0

But this also doesn't have the desired effect (pair_dict["e"] still misses a second element).


Edit: Counter already behaves as defaultdict(int), but merge_with does not trigger this behaviour.

As suggested in the comments, a Counter already has the desired behaviour:

my_counts = Counter(my_list)
assert my_counts["e"] == 0

The issue may actually lie in the way merge_with works: It doesn't trigger the desired defaultdict behaviour.

This is verified by the following test using a defaultdict instead of a Counter:

from collections import defaultdict
my_counts = defaultdict(int)
for letter in my_list:
    my_counts[letter] += 1
pair_dict = merge_with(tuple, my_dict, my_counts)
assert pair_dict["e"] == ('black',)

One must therefore ensure that all keys have been created in the Counter before merging with the other dict, for instance using this trick.


Solution

  • Combining this answer with the use of merge_with, I came up with the following solution:

    from collections import Counter
    from cytoolz import merge_with
    
    my_list = ["a", "b", "a", "a", "c", "d", "b"]
    my_dict = {
        "a" : "blue", "b" : "green", "c" : "yellow", "d" : "red", "e" : "black"}
    my_counts = Counter(my_dict.keys()).update(my_list)
    pair_dict = merge_with(
        tuple, my_dict,
        {k : v - 1 for (k, v) in my_counter.items()})
    assert pair_dict["e"] == ('black', 0)