I have a pandas dataframe. It has a particular column which may or may not contain a continuous set of values as NaN's in its starting and ending. Also, it may or may not contain NaN's intermittently in between as well.
My objective is to eliminate only all those rows where NaN's may intermittently be present in between.
If for example if this is my df:-
df = pd.DataFrame({'A': [np.nan, np.nan, np.nan, np.nan, 45, 1, np.nan, 2, np.nan, 3, np.nan, 6, np.nan, np.nan, 8, 9, 15, np.nan, 18, np.nan, np.nan, np.nan, np.nan],
'B': [22,33,44,55,66,22,11,34,55,67,55,66,22,11,34,33,44,55,6,96,64,93,81]})
Then I need the output as:-
df_new = pd.DataFrame({'A': [np.nan, np.nan, np.nan, np.nan, 45, 1, 2, 3, 6, 8, 9, 15, 18, np.nan, np.nan, np.nan, np.nan],
'B': [22,33,44,55,66,22,34,67,66,34,33,44,6,96,64,93,81]})
Can you please help?
You can use multiple conditions for boolean indexing, with ffill
/bfill
to propagate the non-NA, and isna
/notna
:
# is the A not NA?
m1 = df['A'].notna()
# is the A an external NA?
m2 = df['A'].ffill().isna()
m3 = df['A'].bfill().isna()
out = df.loc[m1|m2|m3]
Alternative with propagation of the booleans using cummin
:
m1 = df['A'].isna()
m2 = m1.cummin()
m3 = m1[::-1].cummin()
out = df.loc[(~m1)|m2|m3]
A third approach using interpolate
to only fill the inner values, then comparing the NA status, if it didn't change with interpolation we either have a non-NA or an outer NA:
out = df.loc[df['A'].isna().eq(df['A'].interpolate(limit_area='inside').isna())]
Output:
A B
0 NaN 22
1 NaN 33
2 NaN 44
3 NaN 55
4 45.0 66
5 1.0 22
7 2.0 34
9 3.0 67
11 6.0 66
14 8.0 34
15 9.0 33
16 15.0 44
18 18.0 6
19 NaN 96
20 NaN 64
21 NaN 93
22 NaN 81
Intermediates (first approach):
A B m1 m2 m3 m1|m2|m3
0 NaN 22 False True False True
1 NaN 33 False True False True
2 NaN 44 False True False True
3 NaN 55 False True False True
4 45.0 66 True False False True
5 1.0 22 True False False True
6 NaN 11 False False False False
7 2.0 34 True False False True
8 NaN 55 False False False False
9 3.0 67 True False False True
10 NaN 55 False False False False
11 6.0 66 True False False True
12 NaN 22 False False False False
13 NaN 11 False False False False
14 8.0 34 True False False True
15 9.0 33 True False False True
16 15.0 44 True False False True
17 NaN 55 False False False False
18 18.0 6 True False False True
19 NaN 96 False False True True
20 NaN 64 False False True True
21 NaN 93 False False True True
22 NaN 81 False False True True