I have the following df
, I want to groupby "group" and drop 1) not only rows that include NaN
, but all the preceding values or 2) rows that precede the last NaN
.
group date value
1 jan2019 NaN
1 jan2019 3
1 jan2019 NaN
1 feb2019 3
1 mar2019 4
1 mar2019 5
2 feb2019 0
2 feb2019 NaN
2 mar2019 7
2 mar2019 4
2 apr2019 5
desired df version 1.
group date value
1 feb2019 3
1 mar2019 4
1 mar2019 5
2 mar2019 7
2 mar2019 4
2 apr2019 5
desired df version 2.
group date value
1 jan2019 NaN
1 feb2019 3
1 mar2019 4
1 mar2019 5
2 feb2019 NaN
2 mar2019 7
2 mar2019 4
2 apr2019 5
You can flag the inf values using abs
+eq
. Then to flag all the values before it as well, you can reverse the order of the Series and use cummax
. Since you want to do this job across groups, you can use groupby.cummax
. Finally, use the boolean mask to filter the desired output via loc
:
out = df.loc[~df['value'].abs().eq(float('inf'))[::-1].groupby(df['group']).cummax()]
If the values to flag are NaNs (rather than inf), then we could use isna
instead:
out = df.loc[~df['value'].isna()[::-1].groupby(df['group']).cummax()]
Output:
group date value
2 1 feb2019 3.0
3 1 mar2019 4.0
4 1 mar2019 5.0
7 2 mar2019 7.0
8 2 mar2019 4.0
9 2 apr2019 5.0
For the third output, you can use groupby.shift
:
out = df.loc[(~df['value'].isna()[::-1].groupby(df['group']).cummax()).groupby(df['group']).shift().fillna(True)]
Output:
group date value
2 1 jan2019 NaN
3 1 feb2019 3.0
4 1 mar2019 4.0
5 1 mar2019 5.0
7 2 feb2019 NaN
8 2 mar2019 7.0
9 2 mar2019 4.0
10 2 apr2019 5.0