I am stuck on a problem which looks simple but for which I cannot find a proper solution.
Consider a given Pandas dataframe df
, composed by multiple columns A1,A2, etc., and let Ai
be one of its column filled for example as follows:
Ai |
---|
25 |
30 |
30 |
NaN |
12 |
15 |
15 |
NaN |
I would like to delete all the rows in df
for which Ai
values are between NaN and a "further change" in its value, so that my output (for column Ai
) would be:
Ai |
---|
25 |
NaN |
12 |
NaN |
Any idea on how to do so would be very much appreciated. Thank you very much in advance.
Similar to the previous solution but with a filter per group to keep the early duplicates
m = df['Ai'].isna()
df.loc[((m|m.shift(fill_value=True))
.groupby(df['Ai'].ne(df['Ai'].shift()).cumsum())
.filter(lambda d: d.sum()>0).index
)]
output:
Ai
0 25.0
1 25.0
2 25.0
5 NaN
6 30.0
7 30.0
9 NaN
This is equivalent to selecting the NaNs and line below. You could use a mask:
m = df['Ai'].isna()
df[m|m.shift(fill_value=True)]
Output:
Ai
0 25.0
3 NaN
4 12.0
7 NaN