I have the following snippet from an example dataframe:
df = pd.DataFrame({'location': ['Seattle', np.nan, 'Portland', 'San Francisco'],
'time': ['2022-06-01 12:00:00', '2022-06-01 13:00:00', '2022-06-01 14:00:00', '2022-06-01 15:00:00']})
I would like to retrieve the rows where location = nan as well as the non-nan row above and below.
So that it will be as such
df = pd.DataFrame({'location': ['Seattle', np.nan, 'Portland'], 'time': ['2022-06-01 12:00:00', '2022-06-01 13:00:00', '2022-06-01 14:00:00']})
How can I achieve this? The dataframe is larger than the example snippet with different cases. But generally is should be: Retrieve all rows with NaN plus the next non-nan row above or below.
Use boolean indexing
with chain mass by |
for bitwise OR
:
m = df['location'].isna()
df = df[m.shift(fill_value=False) | m.shift(-1, fill_value=False) | m]
print (df)
location time
0 Seattle 2022-06-01 12:00:00
1 NaN 2022-06-01 13:00:00
2 Portland 2022-06-01 14:00:00