Let us consider a pandas dataframe df
containing dictionaries in one of its columns (column mydict
):
mystring mydict 0 a {'key1': 'value1'} 1 a {'key2': 'value2'} 2 b {'key2': 'value2'}
I would like to "merge" the dictionaries as part of a groupby operation, e.g. df.groupby('mystring')['mydict'].apply(lambda x: merge_dictionaries(x))
.
The expected output for mystring
'a' would be: {'key1': 'value1', 'key2': 'value2'}
The usual way of combining dictionaries is to update an existing dictionary (dict1.update(dict2)
), so I am not sure how to implement this here.
Here is the code snippet to create the dataframe used in this example:
mycolumns = ['mystring', 'mydict']
df = pd.DataFrame(columns = mycolumns)
df = df.append(pd.DataFrame(['a', {'key1':'value1'}],mycolumns).T)
df = df.append(pd.DataFrame(['a', {'key2':'value2'}],mycolumns).T)
df = df.append(pd.DataFrame(['b', {'key3':'value3'}],mycolumns).T)
df = df.reset_index(drop=True)
EDIT: a way to achieve a similar result without using groupby would be to iterate over the mystring
and then update the dictionary:
merged_dict = {}
for mystring in df.mystring.unique():
for mydict in df[df.mystring==mystring]['mydict']:
print(mydict)
merged_dict.update(mydict)
EDIT 2: append
is no longer available in pandas 2.0.1. Here is the alternate snippet to create the example dataframe using concat
:
mycolumns = ['mystring', 'mydict']
df = pd.DataFrame(columns = mycolumns)
df = pd.concat([df, pd.DataFrame(['a', {'key1':'value1'}],mycolumns).T])
df = pd.concat([df, pd.DataFrame(['a', {'key2':'value2'}],mycolumns).T])
df = pd.concat([df, pd.DataFrame(['b', {'key3':'value3'}],mycolumns).T])
df = df.reset_index(drop=True)
Try:
from functools import reduce
from operator import or_
out = df.groupby("mystring", as_index=False)["mydict"].agg(lambda x: reduce(or_, x))
print(out)
Prints:
mystring mydict
0 a {'key1': 'value1', 'key2': 'value2'}
1 b {'key3': 'value3'}
Or:
out = df.groupby("mystring", as_index=False)["mydict"].agg(
lambda x: reduce(dict.__or__, x)
)
Or:
out = df.groupby("mystring", as_index=False)["mydict"].agg(
lambda x: reduce(lambda a, b: {**a, **b}, x)
)
print(out)
Or:
out = df.groupby("mystring", as_index=False)["mydict"].agg(
lambda x: reduce(lambda a, b: dict(**a, **b), x)
)
print(out)