Search code examples
pandasnan

Get row with NaN as well as preceding and following NaN row


I have the following snippet from an example dataframe:

df = pd.DataFrame({'location': ['Seattle', np.nan, 'Portland', 'San Francisco'],
                   'time': ['2022-06-01 12:00:00', '2022-06-01 13:00:00', '2022-06-01 14:00:00', '2022-06-01 15:00:00']})

I would like to retrieve the rows where location = nan as well as the non-nan row above and below.

So that it will be as such

df = pd.DataFrame({'location': ['Seattle', np.nan, 'Portland'], 'time': ['2022-06-01 12:00:00', '2022-06-01 13:00:00', '2022-06-01 14:00:00']})

How can I achieve this? The dataframe is larger than the example snippet with different cases. But generally is should be: Retrieve all rows with NaN plus the next non-nan row above or below.


Solution

  • Use boolean indexing with chain mass by | for bitwise OR:

    m = df['location'].isna()
    
    df = df[m.shift(fill_value=False) | m.shift(-1, fill_value=False) | m]
    print (df)
       location                 time
    0   Seattle  2022-06-01 12:00:00
    1       NaN  2022-06-01 13:00:00
    2  Portland  2022-06-01 14:00:00