Search code examples
pythonlistdictionarylist-comprehension

An efficient way to aggregate list of dictionaries


I have a list of python dictionaries and I'm trying to aggregate the keys based on different metrics (max, min).

Right now, I am converting the list of dicts to a pandas dataframe and then using the agg function to return my desired output.

But doing so introduces some time and memory usage. Would appreciate some help in making the run-time more efficient without resorting to pandas.

What I've done so far?

boxes = [{'width': 178.25, 'right': 273.25, 'top': 535.0, 'left': 95.0, 'bottom': 549.0, 'height': 14.0}, {'width': 11.17578125, 'right': 87.17578125, 'top': 521.0, 'left': 76.0, 'bottom': 535.0, 'height': 14.0}, {'width': 230.8515625, 'right': 306.8515625, 'top': 492.0, 'left': 76.0, 'bottom': 506.0, 'height': 14.0}, {'width': 14.65234375, 'right': 90.65234375, 'top': 535.0, 'left': 76.0, 'bottom': 549.0, 'height': 14.0}, {'width': 7.703125, 'right': 83.703125, 'top': 506.0, 'left': 76.0, 'bottom': 520.0, 'height': 14.0}, {'width': 181.8515625, 'right': 276.8515625, 'top': 521.0, 'left': 95.0, 'bottom': 535.0, 'height': 14.0}, {'width': 211.25, 'right': 306.25, 'top': 506.0, 'left': 95.0, 'bottom': 520.0, 'height': 14.0}]
boxes = pd.DataFrame(boxes)
boxes = boxes.agg({'left': min, 'right': max, 'top': min, 'bottom': max})
boxes['height'] = boxes['bottom'] - boxes['top']
boxes['width'] = boxes['right'] - boxes['left']
res = boxes.to_dict()

Desired Result

{'left': 76.0, 'right': 306.8515625, 'top': 492.0, 'bottom': 549.0, 'height': 57.0, 'width': 230.8515625}

Solution

  • Here's one approach:

    (i) Use dict.setdefault to merge the dictionaries to create a single one temp

    (ii) Traverse temp and apply the functions in functions on the corresponding keys's values.

    (iii) 'height' and 'width' are not in functions. Calculate them separately.

    functions = {'left': min, 'right': max, 'top': min, 'bottom': max}
    temp = {}
    for d in boxes:
        for k, v in d.items():
            if k in functions:
                temp.setdefault(k, []).append(v)
    
    out = {k: functions[k](v) for k, v in temp.items()}
    out['height'] = out['bottom'] - out['top']
    out['width'] = out['right'] - out['left']
    

    Output:

    {'width': 230.8515625,
     'right': 306.8515625,
     'top': 492.0,
     'left': 76.0,
     'bottom': 549.0,
     'height': 57.0}