Search code examples
pandasdataframegroup-bycounter

Increment counter the first time a number is reached in a groupby


I found this very usefull post here and I'm trying to do same in a groupby df...

Here is the original post with an incremental counter every time df['step'] has a 6 : link

In my case I want to increment this counter everytime 1 occures

So I modified this request :

df['counter'] = ((df['step']==6) & (df.shift(1)['step']!=6 )).cumsum()

like this :

df['counter_2'] = ((df['counter1'] == 1) & (df.shift(1)['counter1'] != 1)).cumsum()

And now I try to make this in a grouping by ('prd_id') this counter_2 grouping by 'product_A', 'product_B', etc


Solution

  • updated answer

    df['counter'] = df['step'].eq(1).groupby(df['prd_id']).cumsum()
    

    Output:

       prd_id  step  counter
    0       A     1        1
    1       A     2        1
    2       A     3        1
    3       A     4        1
    4       A     1        2
    5       A     2        2
    6       B     1        1
    7       B     1        2
    8       B     2        2
    9       B     1        3
    10      B     2        3
    11      B     3        3
    

    original answer

    You can use duplicated, the boolean NOT (~), and cumsum:

    df['counter'] = (~df['step'].duplicated()).cumsum()
    

    Output:

        step  counter
    0      2        1
    1      2        1
    2      2        1
    3      3        2
    4      4        3
    5      4        3
    6      5        4
    7      6        5
    8      6        5
    9      6        5
    10     6        5
    11     7        6
    12     5        6  # not incrementing, 5 was seen above
    13     6        6  # not incrementing, 6 was seen above
    14     6        6
    15     6        6
    16     7        6  # not incrementing, 7 was seen above
    17     5        6  # not incrementing, 5 was seen above
    18     6        6  # not incrementing, 6 was seen above
    19     7        6  # not incrementing, 7 was seen above
    20     5        6  # not incrementing, 5 was seen above
    

    If you also have groups, use:

    df['counter'] = (~df[['step', 'group']].duplicated()).groupby(df['group']).cumsum()
    

    Example:

       group  step  counter
    0      A     1        1
    1      A     2        2
    2      A     2        2
    3      A     3        3
    4      A     2        3
    5      A     4        4
    6      B     1        1  # first time in B
    7      B     1        1
    8      B     2        2
    9      B     1        2  # duplicated in B
    10     B     2        2
    11     B     3        3