python pandas conditional-statements countif cumsum

Pandas cumulative countif (based on condition)

I have a DataFrame df and I am trying to calculate a cumulative count based on the condition that the date in the column at is bigger or equal to the dates in the column recovery_date.

Here is the original df:

    at  recovery_date   
0   2020-02-01  2020-03-02
1   2020-03-01  2020-03-31
2   2020-04-01  2020-05-01
3   2020-05-01  2020-05-31
4   2020-06-01  2020-07-01

Here is the desired outcome:

    at  recovery_date   result
0   2020-02-01  2020-03-02  0
1   2020-03-01  2020-03-31  0
2   2020-04-01  2020-05-01  2
3   2020-05-01  2020-05-31  3
4   2020-06-01  2020-07-01  4

The interpretation is that for each at there are x amount of recovery_dates preceding it or on the same day.

I am trying to avoid using a for loop as I am implementing this for a time-sensitive application.

This is a solution I was able to find, however I am looking for something more performant:

def how_many(at: pd.Timestamp, recoveries: pd.Series) -> int:
    return (at >= recoveries).sum()

df["result"] = [how_many(row["at"], df["recovery_date"][:idx]) for idx, row in df.iterrows()]

Thanks a lot!!

Solution

You're looking for something like this:

df['result'] = df['at'].apply(lambda at: (at >= df['recovery_date']).sum())

What this does is: For each value in the at column, check if there are any recovery_dates that are bigger or equal (at this point we have an array of True (=1) and False (=0) values) then sum them.

This yields your desired output

          at recovery_date  count  result
0 2020-02-01    2020-03-02      1       0
1 2020-03-01    2020-03-31      1       0
2 2020-04-01    2020-05-01      1       2
3 2020-05-01    2020-05-31      1       3
4 2020-06-01    2020-07-01      1       4