I have a dataframe and would like to groupby ìndex1
and get new columns for the first, 2nd, 3rd, fourth and last values of column A
(if they exist). So I would like to use agg so I can aggregate all at once. But somehow that does not work with nth. What am I doing wrong?
df = pd.DataFrame({'index1':[1,2,2,2,3,3], 'A':[1,1,2,3,1,4]})
df.groupby(['index1']).agg({'A': ['first',
lambda x: x.nth(1),
lambda x: x.nth(2),
lambda x: x.nth(3),
'last']
}).reset_index()
so given this dataframe
index1 A
0 1 1
1 2 1
2 2 2
3 2 3
4 3 1
5 3 4
I would like to get is this
index1 first second thrid fourth last
0 1 1 1
1 2 1 2 3 3
2 3 1 4 4
You can combine pivot
and groupby.last
:
g = df.groupby('index1')
(df.assign(col=g.cumcount().add(1))
.pivot(index='index1', columns='col', values='A')
.assign(last=g.last())
.reset_index()
)
output:
col index1 1 2 3 last
0 1 1.0 NaN NaN 1
1 2 1.0 2.0 3.0 3
2 3 1.0 4.0 NaN 4
For named columns:
names = {1: 'first', 2: 'second', 3: 'third', 4: 'fourth'}
g = df.groupby('index1')
out = (df
.assign(col=g.cumcount().add(1))
.pivot(index='index1', columns='col', values='A')
.reindex(columns=list(names)).rename(columns=names)
.assign(last=g.last())
.reset_index()
)
output:
col index1 first second third fourth last
0 1 1 NaN NaN NaN 1
1 2 1 2.0 3.0 NaN 3
2 3 1 4.0 NaN NaN 4