Suppose I have a df like,
column1 | column2 | column3 |
1 | 2023-02-21 | NaN |
1 | 2023-02-22 | NaN |
1 | 2023-02-23 | 8 |
1 | 2023-02-24 | NaN |
1 | 2023-02-24 | NaN |
1 | 2023-02-24 | NaN |
1 | 2023-02-24 | NaN |
1 | 2023-02-24 | 10 |
2 | 2023-02-25 | NaN |
2 | 2023-02-26 | 9 |
Is there a way to achieve the following df,
Result df,
column1 | column2 | column3 | result
1 | 2023-02-21 | NaN | 3
1 | 2023-02-22 | NaN | 3
1 | 2023-02-23 | 8 | 3
1 | 2023-02-24 | NaN | 3
1 | 2023-02-24 | NaN | 3
1 | 2023-02-24 | NaN | 3
1 | 2023-02-24 | NaN | 3
1 | 2023-02-24 | 10 | 3
2 | 2023-02-23 | NaN | 2
2 | 2023-02-24 | 9 | 2
I cannot think of a way to achieve this output other than counting column3 NaN's for a given column1 value using pandas. Any help would be greatly appreciated. Thanks.
Not sure if your exact logic, but maybe:
df['result'] = (df['column3'].isna()
.groupby(df['column1'])
.transform(lambda x: x.cummin().sum()+1)
)