I have a dataframe,
df = pd.DataFrame([["A",1],["A",2],["A",3],["B",4],["C",5],["C",6],["C",7],["C",8],["C",9],["C",10]],columns=["id","val"])
id val
A 1
A 2
A 3
B 4
C 5
C 6
C 7
C 8
C 9
C 10
I need to create a new column which labels every n rows at group level of id column in an incremental order. In this example n=2. So after every 2 rows of same id increase the count in grp column.
Expected output:
df_out = pd.DataFrame([["A",1,1],["A",2,1],["A",3,2],["B",4,1],["C",5,1],["C",6,1],["C",7,2],["C",8,2],["C",9,3],["C",10,3]],columns=["id","val","grp"])
id val grp
A 1 1
A 2 1
A 3 2
B 4 1
C 5 1
C 6 1
C 7 2
C 8 2
C 9 3
C 10 3
How to do it in pandas
Use GroupBy.cumcount
with integer division by n
and add 1
because python/pandas counts from 0
:
n = 2
df['grp'] = df.groupby('id').cumcount() // n + 1
print (df)
id val grp
0 A 1 1
1 A 2 1
2 A 3 2
3 B 4 1
4 C 5 1
5 C 6 1
6 C 7 2
7 C 8 2
8 C 9 3
9 C 10 3