Search code examples
pythonpandasdata-analysis

How to identify nth row that satisfies condition


I have a dataframe with months and sales. If somebody does 250 sales for 3 consecutive months, they get a bonus.

  1. Once the bonus month is identified, the counter resets and starts to count the next 3 consecutive months with 250
  2. if there is less than 250 sales, then the counter resets
Month sales 3 month consistency
Month 1 250
Month 2 250
Month 3 250 1
Month 4 250
Month 5 250
Month 6 250 1
Month 7 0
Month 8 250
Month 9 250
Month 10 250 1
Month 11 250
Month 12 250
Month 13 0
Month 14 250
Month 15 250
Month 16 250 1

I want to populate the 3 month consistency column

Tried using cum count with conditions but it did not work and I could not find such an example anywhere. What I tried was to use rolling window to identify every 3rd consecutive record with 250. But then it does not reset after the each set of 3 consecutive 250s.

df.groupby('id').apply(lambda x: x['gt_250'].rolling(min_periods=1, window=3).sum())
#-- grouped by id because this has to be computed for each salesperson_id

The result I got from it was,

Month sales 3 month consistency
Month 1 250
Month 2 250
Month 3 250 3
Month 4 250 3
Month 5 250 3

Solution

  • You could use a custom groupby.cumcount and mod:

    N = 3 # number of consecutive months
    
    # identify target months
    m = df['sales'].ge(250)
    # for groups of consecutive target months
    group = (~m).cumsum()
    
    # count the Nth repeat for each group
    df['3 month consistency'] = (df[m].groupby(group).cumcount().mod(N).eq(N-1)
                                 .map({True: 1})
                                )
    

    NB. if you want a boolean column replace .map({True: 1}) by .reindex(df.index).eq(True).

    Output:

           Month  sales  3 month consistency
    0    Month 1    250                  NaN
    1    Month 2    250                  NaN
    2    Month 3    250                  1.0
    3    Month 4    250                  NaN
    4    Month 5    250                  NaN
    5    Month 6    250                  1.0
    6    Month 7      0                  NaN
    7    Month 8    250                  NaN
    8    Month 9    250                  NaN
    9   Month 10    250                  1.0
    10  Month 11    250                  NaN
    11  Month 12    250                  NaN
    12  Month 13      0                  NaN
    13  Month 14    250                  NaN
    14  Month 15    250                  NaN
    15  Month 16    250                  1.0
    

    Intermediates:

           Month  sales      m  group  cumcount  mod  3 month consistency
    0    Month 1    250   True      0       0.0  0.0                  NaN
    1    Month 2    250   True      0       1.0  1.0                  NaN
    2    Month 3    250   True      0       2.0  2.0                  1.0
    3    Month 4    250   True      0       3.0  0.0                  NaN
    4    Month 5    250   True      0       4.0  1.0                  NaN
    5    Month 6    250   True      0       5.0  2.0                  1.0
    6    Month 7      0  False      1       NaN  NaN                  NaN
    7    Month 8    250   True      1       0.0  0.0                  NaN
    8    Month 9    250   True      1       1.0  1.0                  NaN
    9   Month 10    250   True      1       2.0  2.0                  1.0
    10  Month 11    250   True      1       3.0  0.0                  NaN
    11  Month 12    250   True      1       4.0  1.0                  NaN
    12  Month 13      0  False      2       NaN  NaN                  NaN
    13  Month 14    250   True      2       0.0  0.0                  NaN
    14  Month 15    250   True      2       1.0  1.0                  NaN
    15  Month 16    250   True      2       2.0  2.0                  1.0
    

    Variant (with a boolean output):

    N = 3
    m = df['sales'].ge(250)
    group = m.ne(m.shift()).cumsum()
    df['3 month consistency'] = df.groupby(group).cumcount().mod(N).eq(N-1)
    

    Intermediates:

           Month  sales      m  group  cumcount  mod  3 month consistency
    0    Month 1    250   True      1         0    0                False
    1    Month 2    250   True      1         1    1                False
    2    Month 3    250   True      1         2    2                 True
    3    Month 4    250   True      1         3    0                False
    4    Month 5    250   True      1         4    1                False
    5    Month 6    250   True      1         5    2                 True
    6    Month 7      0  False      2         0    0                False
    7    Month 8    250   True      3         0    0                False
    8    Month 9    250   True      3         1    1                False
    9   Month 10    250   True      3         2    2                 True
    10  Month 11    250   True      3         3    0                False
    11  Month 12    250   True      3         4    1                False
    12  Month 13      0  False      4         0    0                False
    13  Month 14    250   True      5         0    0                False
    14  Month 15    250   True      5         1    1                False
    15  Month 16    250   True      5         2    2                 True
    

    per ID

    N = 3
    m = df['sales'].ge(250)
    group = m.ne(m.shift()).cumsum()
    df['3 month consistency'] = df.groupby(['ID', group]).cumcount().mod(N).eq(N-1)
    

    Example:

       ID     Month  sales  3 month consistency
    0   A   Month 1    250                False
    1   A   Month 2    250                False
    2   A   Month 3    250                 True
    3   A   Month 4    250                False
    4   A   Month 5    250                False
    5   A   Month 6    250                 True
    6   A   Month 7      0                False
    7   A   Month 8    250                False
    8   A   Month 9    250                False
    9   A  Month 10    250                 True
    10  A  Month 11    250                False
    11  A  Month 12    250                False
    12  A  Month 13      0                False
    13  A  Month 14    250                False
    14  A  Month 15    250                False
    15  B   Month 1    250                False  # independent from A
    16  B   Month 2    250                False
    17  B   Month 3    250                 True
    18  B   Month 4    250                False
    19  B   Month 5    250                False
    20  B   Month 6    250                 True
    21  B   Month 7      0                False
    22  B   Month 8    250                False
    23  B   Month 9    250                False
    24  B  Month 10    250                 True
    25  B  Month 11    250                False
    26  B  Month 12    250                False
    27  B  Month 13      0                False
    28  B  Month 14    250                False
    29  B  Month 15    250                False