Search code examples
pythonpandasdataframegroup-by

Label every n rows of the group in an incremental order in pandas


I have a dataframe,

df = pd.DataFrame([["A",1],["A",2],["A",3],["B",4],["C",5],["C",6],["C",7],["C",8],["C",9],["C",10]],columns=["id","val"])
id  val
A   1
A   2
A   3
B   4
C   5
C   6
C   7
C   8
C   9
C   10

I need to create a new column which labels every n rows at group level of id column in an incremental order. In this example n=2. So after every 2 rows of same id increase the count in grp column.

Expected output:

df_out = pd.DataFrame([["A",1,1],["A",2,1],["A",3,2],["B",4,1],["C",5,1],["C",6,1],["C",7,2],["C",8,2],["C",9,3],["C",10,3]],columns=["id","val","grp"])
id  val grp
A   1   1
A   2   1
A   3   2
B   4   1
C   5   1
C   6   1
C   7   2
C   8   2
C   9   3
C   10  3

How to do it in pandas


Solution

  • Use GroupBy.cumcount with integer division by n and add 1 because python/pandas counts from 0:

    n = 2
    df['grp'] = df.groupby('id').cumcount() // n + 1  
    print (df)
      id  val  grp
    0  A    1    1
    1  A    2    1
    2  A    3    2
    3  B    4    1
    4  C    5    1
    5  C    6    1
    6  C    7    2
    7  C    8    2
    8  C    9    3
    9  C   10    3