Search code examples
pythonpandasdataframepandas-groupby

How to create a new column that increments within a subgroup of a group in Python?


I have a problem where I need to group the data by two groups, and attach a column that sort of counts the subgroup.

Example dataframe looks like this:

colA colB
 1     a 
 1     a  
 1     c  
 1     c  
 1     f  
 1     z 
 1     z  
 1     z  
 2     a  
 2     b  
 2     b
 2     b
 3     c  
 3     d  
 3     k  
 3     k 
 3     m  
 3     m  
 3     m      

Expected output after attaching the new column is as follows:

colA colB colC
 1     a    1
 1     a    1
 1     c    2
 1     c    2
 1     f    3
 1     z    4
 1     z    4
 1     z    4
 2     a    1
 2     b    2
 2     b    2
 2     b    2
 3     c    1  
 3     d    2 
 3     k    3    
 3     k    3 
 3     m    4
 3     m    4
 3     m    4

I tried the following but I cannot get this trivial looking problem solved:

Solution 1 I tried that does not give what I am looking for:

df['ONES']=1
df['colC']=df.groupby(['colA','colB'])['ONES'].cumcount()+1
df.drop(columns='ONES', inplace=True)

I also played with transform, and cumsum functions, and apply, but I cannot seem to solve this. Any help is appreciated.

Edit: minor error on dataframes.

Edit 2: For simplicity purposes, I showed similar values for column B, but the problem is within a larger group (indicated by colA), colB may be different and therefore, it needs to be grouped by both at the same time.

Edit 3: Updated dataframes to emphasize what I meant by my second edit. Hope this makes it more clear and reproduceable.


Solution

  • You could use groupby + ngroup:

    df['colC'] = df.groupby('colA').apply(lambda x: x.groupby('colB').ngroup()+1).droplevel(0)
    

    Output:

        colA colB  colC
    0      1    a     1
    1      1    a     1
    2      1    c     2
    3      1    c     2
    4      1    f     3
    5      1    z     4
    6      1    z     4
    7      1    z     4
    8      2    a     1
    9      2    b     2
    10     2    b     2
    11     2    b     2
    12     3    c     1
    13     3    d     2
    14     3    k     3
    15     3    k     3
    16     3    m     4
    17     3    m     4
    18     3    m     4