Search code examples
pythonpandasrow

Python Pandas Conditional Calculation row by row


I have a dataset with values in a column A representing a score. I want to calculate a column B where I can see for how long the score is equal or higher than in the last row. If the value is lower then I will set the value in column B for this row to 0.

I tried the following:

df = pd.DataFrame({'A': [140, 145,148, 150, 100, 105, 106]})

df['B'] = 0
df.loc[df['A'].diff() >= 0, 'B'] = df['B'].shift() + 1

The result is the following

    A       B
0   140     0.0
1   145     1.0
2   148     1.0
3   150     1.0
4   100     0.0
5   105     1.0
6   106     1.0

So I understand that the condition is checked for each row - but unfortunately it seems to be calculated all together, so that it does not increment value B as expected because at the time of calculation for each row the value of the row-1 .shift() is still 0.

How do I have to do it to get the following result?:

    A       B
0   140     0
1   145     1
2   148     2
3   150     3
4   100     0
5   105     1
6   106     2

Solution

  • You need cumsum to identify the blocks, and groupby().cumcount() to enumerate each the block:

    s = df['A'].diff().ge(0)
    df['B'] = s.groupby((~s).cumsum()).cumcount()
    

    Output:

         A  B
    0  140  0
    1  145  1
    2  148  2
    3  150  3
    4  100  0
    5  105  1
    6  106  2