I have a dataset including nested dictionaries that I would like to unpack and form a multiindex dataframe.
The dataframe should have columns for Years with sub columns for the Result Types in each year (yield and quality in this case).
Typical dataset:
datalist= [
{
'trial': 'efr1',
'location': 'aberdeen',
'2010': {'yield': '100', 'quality': '97'},
'2011': {'yield': '90', 'quality': '87'},
'2012': {'yield': '88', 'quality': '90'}
},
{
'trial': 'efr2',
'location': 'bristol',
'2010': {'yield': '88', 'quality': '90'},
'2011': {'yield': '75', 'quality': '82'},
'2012': {'yield': '77', 'quality': '80'}
},
{
'trial': 'axy1',
'location': 'newcastle',
'2010': {'yield': '91', 'quality': '95'},
'2011': {'yield': '93', 'quality': '93'},
'2012': {'yield': '75', 'quality': '97'}
}
]
Using Dataframe.from_dict() produces a table with embedded dictionaries elements which I cannot figure out how to unpack and split to subcolumns.
On the other hand using json_normalize() produces a flat table with compound headings, and I cannot figure out how to to convert that into a multiindex frame with the structure I need...
If I understand correctly, you could use pandas.json_normalize
, then convert the year.type
columns to a MultiIndex (with str.split
):
out = (
pd.json_normalize(datalist)
.set_index(['trial', 'location'])
.pipe(lambda d: d.set_axis(d.columns.str.split('.', expand=True), axis=1))
)
Variant:
out = pd.json_normalize(datalist).set_index(['trial', 'location'])
out.columns = out.columns.str.split('.', expand=True)
Output:
2010 2011 2012
yield quality yield quality yield quality
trial location
efr1 aberdeen 100 97 90 87 88 90
efr2 bristol 88 90 75 82 77 80
axy1 newcastle 91 95 93 93 75 97
Note that setting the first two columns as index is optional, but if you don't the MultiIndex will be:
trial location 2010 2011 2012
NaN NaN yield quality yield quality yield quality
0 efr1 aberdeen 100 97 90 87 88 90
1 efr2 bristol 88 90 75 82 77 80
2 axy1 newcastle 91 95 93 93 75 97
Intermediate after pd.json_normalize(datalist)
:
trial location 2010.yield 2010.quality 2011.yield 2011.quality 2012.yield 2012.quality
0 efr1 aberdeen 100 97 90 87 88 90
1 efr2 bristol 88 90 75 82 77 80
2 axy1 newcastle 91 95 93 93 75 97