Multiple data points are in a list. I want to combine them into one pandas DataFrame. Minimal example:
list_of_frames = [pd.DataFrame({'name':'adam', 'height':'180'}, index=[0]), pd.DataFrame({'name':'adam', 'weight':'80'}, index=[1]), pd.DataFrame({'name':'eve', 'height':'190'}, index=[2])]
How do I obtain the following DataFrame?
name height weight
0 adam 180 80
1 eve 190 NaN
If I call pd.concat(list_of_frames)
I obtain a list of entries
name height weight
0 adam 180 NaN
1 adam NaN 80
2 eve 190 NaN
Obviously the height
variable has been 'merged'. Can I collapse this DataFrame?
Alternatively I tried reduce(lambda l, r: pd.merge(l, r, on='name', how='outer'), list_of_frames)
which leads to
name height_x weight height_y
0 adam 180 80 NaN
1 eve NaN NaN 190
Here we have separate column names. I feel like I am missing something obvious. Thanks for the help!
If you always have single rows DataFrames as input, "name" acts as unique key, use groupby.first
:
pd.concat(list_of_frames).groupby('name', as_index=False).first()
Output:
name height weight
0 adam 180 80
1 eve 190 None