Search code examples
pythonpandasdictionarygroup-by

How to merge dictionaries contained in a Pandas dataframe as a groupby operation


Let us consider a pandas dataframe df containing dictionaries in one of its columns (column mydict):

      mystring    mydict 
0     a           {'key1': 'value1'} 
1     a           {'key2': 'value2'}
2     b           {'key2': 'value2'}

I would like to "merge" the dictionaries as part of a groupby operation, e.g. df.groupby('mystring')['mydict'].apply(lambda x: merge_dictionaries(x)).

The expected output for mystring 'a' would be: {'key1': 'value1', 'key2': 'value2'}

The usual way of combining dictionaries is to update an existing dictionary (dict1.update(dict2)), so I am not sure how to implement this here.

Here is the code snippet to create the dataframe used in this example:

mycolumns = ['mystring', 'mydict']
df = pd.DataFrame(columns = mycolumns)
df = df.append(pd.DataFrame(['a', {'key1':'value1'}],mycolumns).T)
df = df.append(pd.DataFrame(['a', {'key2':'value2'}],mycolumns).T)
df = df.append(pd.DataFrame(['b', {'key3':'value3'}],mycolumns).T)
df = df.reset_index(drop=True)

EDIT: a way to achieve a similar result without using groupby would be to iterate over the mystring and then update the dictionary:

merged_dict = {}
for mystring in df.mystring.unique():
    for mydict in df[df.mystring==mystring]['mydict']:
        print(mydict)
        merged_dict.update(mydict)

EDIT 2: append is no longer available in pandas 2.0.1. Here is the alternate snippet to create the example dataframe using concat:

mycolumns = ['mystring', 'mydict']
df = pd.DataFrame(columns = mycolumns)
df = pd.concat([df, pd.DataFrame(['a', {'key1':'value1'}],mycolumns).T])
df = pd.concat([df, pd.DataFrame(['a', {'key2':'value2'}],mycolumns).T])
df = pd.concat([df, pd.DataFrame(['b', {'key3':'value3'}],mycolumns).T])
df = df.reset_index(drop=True)

Solution

  • Try:

    from functools import reduce
    from operator import or_
    
    out = df.groupby("mystring", as_index=False)["mydict"].agg(lambda x: reduce(or_, x))
    
    print(out)
    

    Prints:

      mystring                                mydict
    0        a  {'key1': 'value1', 'key2': 'value2'}
    1        b                    {'key3': 'value3'}
    

    Or:

    out = df.groupby("mystring", as_index=False)["mydict"].agg(
        lambda x: reduce(dict.__or__, x)
    )
    

    Or:

    out = df.groupby("mystring", as_index=False)["mydict"].agg(
        lambda x: reduce(lambda a, b: {**a, **b}, x)
    )
    print(out)
    

    Or:

    out = df.groupby("mystring", as_index=False)["mydict"].agg(
        lambda x: reduce(lambda a, b: dict(**a, **b), x)
    )
    print(out)