Search code examples
pandasdataframepandas-groupby

How to merge multiple row into one row given two column?


Suppose we have the dataframe like below.

   ID  time   feature1   feature2  feature3
0   1   1      10.2        2.3        60
1   2   3       8.9        -1.2       90
3   1   3       3.9        6.2        80

I want to merge all the rows corresponding to each ID into one row with the feature name modified by time column. For example, let's take ID = 1 for example

Before merge:

   ID  time   feature1   feature2  feature3
0   1   1      10.2        2.3        60
3   1   3       3.9        6.2        80

After merge:

   ID  feature1_1  feature2_1  feature3_1  feature1_3  feature2_3  feature3_1
0   1    10.2         2.3          60         3.9         6.2          80

One of the problem is that, for the time column, lets say the complete set would be [1, 3, 5, 7, 9]. However, for each ID, it might contains only subset of [1, 3, 5, 7, 9], for example, [1, 3, 5]

So the final result would looks like follows

   ID  feature1_1  feature2_1  feature3_1  feature1_3  feature2_3  feature3_3
0   1    10.2         2.3          60         3.9         6.2          80
...
1  168    8.7         6.2          45         NaN         NaN          NaN

Solution

  • You can use melt to flatten your dataframe then process data and finally reshape your dataframe with pivot:

    rename_feature = lambda x: x['feature'] + '_' + x.groupby(['ID', 'feature']).cumcount().add(1).astype(str)
    
    out = (df.melt(['ID', 'time'], var_name='feature')
             .assign(feature=rename_feature)
             .pivot('ID', 'feature', 'value')
             .reset_index().rename_axis(columns=None))
    

    Output:

    >>> out
       ID  feature1_1  feature1_2  feature2_1  feature2_2  feature3_1  feature3_2
    0   1        10.2         3.9         2.3         6.2        60.0        80.0
    1   2         8.9         NaN        -1.2         NaN        90.0         NaN