Search code examples
pythonpandasgroup-by

Groupby streak of numbers and a mask to groupby two rows after each group


This is my dataframe:

df = pd.DataFrame({'a': [20, 1, 55, 333, 444, 1, 2, 10], 'b': [20, 20, 21, 21, 21, 22, 22, 22]})

I want to group them by column b and two rows after each group. This is the output that I need:

    a   b
0  20   20
1   1   20
2   55  21
3  333  21

     a   b
2   55  21
3  333  21
4  444  21
5   1   22
6   2   22

    a   b
5   1   22
6   2   22
7  10   22

I know that I need a mask. I have tried some of them but didn't work. This is one of my tries:

df.groupby(df.b.diff().cumsum().eq(1))

Solution

  • You cannot achieve this just with a grouper as groups cannot overlap.

    You can however use a groupby with side effects by referencing the original DataFrame:

    (df.groupby('b')
       .apply(lambda g: df.loc[g.index[0]:g.index[-1]+2])
    )
    

    Output:

            a   b
    b            
    20 0   20  20
       1    1  20
       2   55  21
       3  333  21
    21 2   55  21
       3  333  21
       4  444  21
       5    1  22
       6    2  22
    22 5    1  22
       6    2  22
       7   10  22
    

    As a loop:

    for k, g in df.groupby('b'):
        print(k)
        print(df.loc[g.index[0]:g.index[-1]+2])
    

    Output:

    20
         a   b
    0   20  20
    1    1  20
    2   55  21
    3  333  21
    21
         a   b
    2   55  21
    3  333  21
    4  444  21
    5    1  22
    6    2  22
    22
        a   b
    5   1  22
    6   2  22
    7  10  22