Search code examples
python-3.xpandasdataframe

Combine list of dataframes into one big dataframe avoiding duplicates on columns and indices


Multiple data points are in a list. I want to combine them into one pandas DataFrame. Minimal example:

list_of_frames = [pd.DataFrame({'name':'adam', 'height':'180'}, index=[0]), pd.DataFrame({'name':'adam', 'weight':'80'}, index=[1]), pd.DataFrame({'name':'eve', 'height':'190'}, index=[2])]

How do I obtain the following DataFrame?

    name    height  weight
0   adam    180     80
1   eve     190     NaN

If I call pd.concat(list_of_frames) I obtain a list of entries

    name    height  weight
0   adam    180     NaN
1   adam    NaN     80
2   eve     190     NaN

Obviously the height variable has been 'merged'. Can I collapse this DataFrame?

Alternatively I tried reduce(lambda l, r: pd.merge(l, r, on='name', how='outer'), list_of_frames) which leads to

    name    height_x    weight  height_y
0   adam    180     80  NaN
1   eve     NaN     NaN     190

Here we have separate column names. I feel like I am missing something obvious. Thanks for the help!


Solution

  • If you always have single rows DataFrames as input, "name" acts as unique key, use groupby.first:

    pd.concat(list_of_frames).groupby('name', as_index=False).first()
    

    Output:

       name height weight
    0  adam    180     80
    1   eve    190   None