Search code examples
pythonjsondictionarynestedpython-itertools

summing nested dictionary entries


I have a JSON file that I'm reading in as a dictionary. What I have is something like:

        "20101021": {
            "4x4": {
                "Central Spectrum": 5, 
                "Full Frame": 5, 
                "Custom": 1
            }, 
            "4x2": {
                "Central Spectrum": 5, 
                "Full Frame": 5
            }, 
            "1x1": {
                "Central Spectrum": 5, 
                "Full Frame": 4
            }, 
        }, 
        "20101004": {
            "4x4": {
                "Central Spectrum": 5, 
                "Full Frame": 5
            }, 
            "4x2": {
                "Central Spectrum": 5, 
                "Full Frame": 5
            }, 
            "1x1": {
                "Central Spectrum": 5, 
                "Full Frame": 5
            }

and so on. I am trying to calculate sums (over all dates) for all combinations of 1x1, 4x2 (etc.) and Central Spectrum and Full Frame, in this example I'd want to add up the 5s.

What I have so far is this (using itertools and Counter()):

bins = map("x".join, itertools.product('124', repeat=2))
rois = ['Full Frame', 'Central Spectrum']
types = itertools.product(bins, rois)
c = collections.Counter(dict)
for type in types:
    print "%s : %d" % (type, c[type])

This prints out a nice list of all combinations, but fails to do any actual summing of values. Can you help?


Solution

  • Maybe I misunderstood the expected final result, but you might not need counters... A simple sum could suffice if you know that you're only going to have two levels of nesting.

    Let's assume you loaded your json dictionary of dictionaries into a variable called data.

    Then you can do:

    results = {}
    for key in data.keys():
        # key is '20101021', '20101004'...
        # data[key].keys() is '4x4, '4x2'... so let's make sure
        # that the result dictionary contains all those '4x4', '4x2'
        # being zero if nothing better can be calculated.
        results[key] = dict.fromkeys(data[key].keys(), 0)
    
        for sub_key in data[key].keys():
            # sub_key is '4x4', '4x2'...
            # Also, don't consider a 'valid value' someting that is not a
            # "Central Spectrum" or a "Full Frame"
            valid_values = [
                int(v) for k, v in data[key][sub_key].items()
                if k in ["Central Spectrum", "Full Frame"]
            ]
            # Now add the 'valid_values'
            results[key][sub_key] = sum(valid_values)
    print results
    

    Which outputs:

    {
      u'20101021': {u'1x1': 9, u'4x4': 10, u'4x2': 10},
      u'20101004': {u'1x1': 10, u'4x4': 10, u'4x2': 10}
    }
    

    In many cases, I only used dict.keys() because maybe that clarifies the process? (well, and once dict.items()) You also have dict.values() (and all the tree functions have their iterator equivalents) which might shorten your code. Also, see what dict.fromkeys does.

    EDIT (as per OP's comments to this answer)

    If you want data added (or "collected") over time, then you need to need to move your results[key] from the date string (as shown above in the answer) to the 1x1, 4x4...

    VALID_KEYS = ["Central Spectrum", "Full Frame"]
    results = {}
    for key_1 in data.keys():
        # key_1 is '20101021', '20101004'...
    
        for key_2 in data[key_1].keys():
            # key_2 is '4x4', '4x2'...
            if key_2 not in results:
                results[key_2] = dict.fromkeys(VALID_KEYS, 0)
            for key_3 in data[key_1][key_2].keys():
                # key_3 is 'Central Spectrum', 'Full Frame', 'Custom'...
                if key_3 in VALID_KEYS:
                    results[key_2][key_3] += data[key_1][key_2][key_3]
    print results
    

    Which outputs:

    {
        u'1x1': {'Central Spectrum': 10, 'Full Frame': 9},
        u'4x4': {'Central Spectrum': 10, 'Full Frame': 10},
        u'4x2': {'Central Spectrum': 10, 'Full Frame': 10}
    }