pythonpandasdataframenumpy

How to get the number of rows between the current row and the last row where a condition was met? - Pandas


For example in the following dataframe the column 'b' is calculated based on the last time column 'a' was True:

     a       b
0    True    0
1    False   1
2    True    0
3    False   1
4    False   2
5    False   3

Currently I use the code below to make this work. But the problem is because I'm using a loop, the code is very slow.

a=np.where(cond)[-1]
b=np.array([],dtype=np.int64)
s=0
for i in range(0,len(data)):
    if i in a:
        b=np.append(b,0)
        s=0
    else:
        b=np.append(b,s)
    s+=1
data['b']=pd.Series(b).fillna(method='ffill').fillna(-1)

Is there a faster way to do this without using a for loop?


Solution

  • IIUC, you can use groupby_cumcount:

    df['b'] = df.groupby(df['a'].cumsum()).cumcount()
    print(df)
    
    # Output
           a  b
    0   True  0
    1  False  1
    2   True  0
    3  False  1
    4  False  2
    5  False  3