Search code examples
pythonpandasgroup-by

Groupby streak of numbers and a mask


This is my pandas dataframe:

df = pd.DataFrame({'a': [10, 20, 1, 55, 66, 333, 444, 1, 2, 10], 'b': [1,1, 1, -1, -1, -1, -1, 1, 1, -1]})

And this is the way that I need it after using groupby. I want all of 1s in b and two -1 after the streak of 1s. For example the first group is all of the consecutive 1s and then after the streak ends I want two -1s. If the streak of -1 is less than two, just gives the first -1 which is group two in the example:

     a  b
0   10  1
1   20  1
2    1  1
3   55 -1
4   66 -1

     a  b
7    1  1
8    2  1
9   10 -1

I know that I need a mask. I have tried some of them but didn't work. These are some of my tries:

df.groupby(df.b.diff().cumsum().eq(1))
df.groupby(df['b'].ne(df['b'].shift()).cumsum())

Solution

  • The logic is not fully clear.

    Assuming you want to reset the group when "b" shifts from -1 to 1. And let's add a condition on "a" for the demo:

    # restart group on -1 -> 1
    group = df['b'].diff().gt(0).cumsum()
    # drop rows with a≥100
    m = df['a'].lt(100)
    
    for k,g in df[m].groupby(group):
        print(g)
    

    Output:

    
        a  b
    0  10  1
    1  20  1
    2   1  1
    3  55 -1
    4  66 -1
        a  b
    7   1  1
    8   2  1
    9  10 -1
    

    update

    m = df['b'].eq(1)
    group = (m&m.ne(m.shift())).cumsum()
    n = m.groupby(group).sum()
    
    df.groupby(group, group_keys=False).apply(lambda g: g.head(n[g.name]+2)) 
    

    Output:

    
        a  b
    0  10  1
    1  20  1
    2   1  1
    3  55 -1
    4  66 -1
    7   1  1
    8   2  1
    9  10 -1
    

    as a for loop:

    m = df['b'].eq(1)
    group = (m&m.ne(m.shift())).cumsum()
    n = m.groupby(group).sum()
    
    for k, g in df.groupby(group, group_keys=False):
        print(k)
        print(g.head(n[k]+2))
    

    Output:

    1
        a  b
    0  10  1
    1  20  1
    2   1  1
    3  55 -1
    4  66 -1
    2
        a  b
    7   1  1
    8   2  1
    9  10 -1