Search code examples
pythonsumtuplesdefaultdict

Sum Values in a Nested Dictionary Containing Duplicates


I've a defaultdict(int) of tuples (it contains 2 inner keys) and I'd like to sum the values of only those inner keys which have the same 1st element of the tuple.

For example:

dict = defaultdict(lambda: defaultdict(int))

dict[key][elem_one, elem_two] += 1  # defaultdict format

# The keys (elem1, elem2) and their values (ints) under the first outer key:
(a1, b1) = 1
(a2, b2) = 1
(a2, b1) = 7
(a3, b3) = 4
(a3, b1) = 10

I'd like to sum b1, b2, and b3 together, regardless of the value of a in the first element of the tuple, and output a set containing just the values of b with their summed values. Desired output:

{b1 = 18, b2 = 1, b3 = 4}

What I've tried so far is:

out_set = {k[1]: v for k, v in dict[key].items()}  # k[1] gives me b

This gives me the correct format, but the wrong numbers! It's not summing. I get something like:

{b1 = 1, b2 = 1, b3 = 1}

So I tried to change my code as follows:

out_set = {k[1]: sum(v) for k, v in dict[key].items()}  # k[1] gives me b

But I get the following error:

TypeError: 'int' object is not iterable

How can I correct this?

In my actual code, my defaultdict(int) is much larger. For instance, one key can contain 16 values of (elem1, elem2) with their respective values. What I've found is that when I convert to the set, it removes any duplicate elem2 (which is desired), but it seems to take a random value that's associated with elem2, and not the sum of duplicate elem2s (which is desired).


Solution

  • I suggest you use another defaultdict and take advantage of the fact that the default value for an integer is 0, and therefore, you will be able to add to a newly declared (and not explicitly initialized) integer because it's gonna be implicitily initialized to 0 anyway. Something like that:

    import collections
    
    dct = {
        (1, 1): 1,  # This will be overwritten by (1, 1): 7
        (2, 2): 1,
        (1, 1): 7,
        (3, 3): 4,
        (3, 1): 10
    }
    
    aux_dict = collections.defaultdict(int)
    for key, val in dct.items():
        aux_dict[key[1]] += val
    print(aux_dict) 
    

    Will output:

    defaultdict(<type 'int'>, {1: 17, 2: 1, 3: 4})
    

    However, because of how tuples (and dicts) work, I find important insisting in the fact that you can not have two keys that are equal. Meaning, this...

    dct = {
        (1, 1): 1,  # This will be overwritten by (1, 1): 7
        (1, 1): 7,
    }
    print(dct)
    

    ... will just output {(1, 1): 7}. If you need to keep duplicate keys... somehow, well, that's a different issue: that is not related to summing values, but to how to keep "duplicate" keys in a dictionary. I quote the duplicate because you really can't have two keys that are the same. We'll need to figure out a way of not considering those keys as duplicate.

    What I mean with the paragraph above is that you won't be getting b1: 18 because (a1, b1): 1 is not in the dictionary of values to be added to begin with since you have (a1, b1): 7 latter in the code.