Search code examples
pythonpandasaggregateaggregate-functions

groupby with nth() and agg


I have a dataframe and would like to groupby ìndex1 and get new columns for the first, 2nd, 3rd, fourth and last values of column A (if they exist). So I would like to use agg so I can aggregate all at once. But somehow that does not work with nth. What am I doing wrong?

df = pd.DataFrame({'index1':[1,2,2,2,3,3], 'A':[1,1,2,3,1,4]})
df.groupby(['index1']).agg({'A': ['first',
                                 lambda x: x.nth(1),
                                 lambda x: x.nth(2),
                                 lambda x: x.nth(3),
                                 'last']
}).reset_index()

so given this dataframe

  index1  A
0       1     1
1       2     1
2       2     2
3       2     3
4       3     1
5       3     4

I would like to get is this

   index1  first second thrid fourth  last
0       1      1                         1
1       2      1      2     3            3
2       3      1      4                  4

Solution

  • You can combine pivot and groupby.last:

    g = df.groupby('index1')
    
    (df.assign(col=g.cumcount().add(1))
       .pivot(index='index1', columns='col', values='A')
       .assign(last=g.last())
       .reset_index()
    )
    

    output:

    col  index1    1    2    3  last
    0         1  1.0  NaN  NaN     1
    1         2  1.0  2.0  3.0     3
    2         3  1.0  4.0  NaN     4
    

    For named columns:

    names = {1: 'first', 2: 'second', 3: 'third', 4: 'fourth'}
    
    g = df.groupby('index1')
    
    out = (df
       .assign(col=g.cumcount().add(1))
       .pivot(index='index1', columns='col', values='A')
       .reindex(columns=list(names)).rename(columns=names)
       .assign(last=g.last())
       .reset_index()
    )
    

    output:

    col  index1  first  second  third  fourth  last
    0         1      1     NaN    NaN     NaN     1
    1         2      1     2.0    3.0     NaN     3
    2         3      1     4.0    NaN     NaN     4