Search code examples
pythonpandasdataframenumpynan

How to only remove rows with NaN that are not at the beginning or end of the pandas Dataframe column?


I have a pandas dataframe. It has a particular column which may or may not contain a continuous set of values as NaN's in its starting and ending. Also, it may or may not contain NaN's intermittently in between as well.

My objective is to eliminate only all those rows where NaN's may intermittently be present in between.

If for example if this is my df:-

df = pd.DataFrame({'A': [np.nan, np.nan, np.nan, np.nan, 45, 1, np.nan, 2, np.nan, 3, np.nan, 6, np.nan, np.nan, 8, 9, 15, np.nan, 18, np.nan, np.nan, np.nan, np.nan],
'B': [22,33,44,55,66,22,11,34,55,67,55,66,22,11,34,33,44,55,6,96,64,93,81]})

Then I need the output as:-

df_new = pd.DataFrame({'A': [np.nan, np.nan, np.nan, np.nan, 45, 1, 2, 3, 6, 8, 9, 15, 18, np.nan, np.nan, np.nan, np.nan],
'B': [22,33,44,55,66,22,34,67,66,34,33,44,6,96,64,93,81]})

Can you please help?


Solution

  • You can use multiple conditions for boolean indexing, with ffill/bfill to propagate the non-NA, and isna/notna:

    # is the A not NA?
    m1 = df['A'].notna()
    # is the A an external NA?
    m2 = df['A'].ffill().isna()
    m3 = df['A'].bfill().isna()
    
    out = df.loc[m1|m2|m3]
    

    Alternative with propagation of the booleans using cummin:

    m1 = df['A'].isna()
    m2 = m1.cummin()
    m3 = m1[::-1].cummin()
    
    out = df.loc[(~m1)|m2|m3]
    

    A third approach using interpolate to only fill the inner values, then comparing the NA status, if it didn't change with interpolation we either have a non-NA or an outer NA:

    out = df.loc[df['A'].isna().eq(df['A'].interpolate(limit_area='inside').isna())]
    

    Output:

           A   B
    0    NaN  22
    1    NaN  33
    2    NaN  44
    3    NaN  55
    4   45.0  66
    5    1.0  22
    7    2.0  34
    9    3.0  67
    11   6.0  66
    14   8.0  34
    15   9.0  33
    16  15.0  44
    18  18.0   6
    19   NaN  96
    20   NaN  64
    21   NaN  93
    22   NaN  81
    

    Intermediates (first approach):

           A   B     m1     m2     m3  m1|m2|m3
    0    NaN  22  False   True  False      True
    1    NaN  33  False   True  False      True
    2    NaN  44  False   True  False      True
    3    NaN  55  False   True  False      True
    4   45.0  66   True  False  False      True
    5    1.0  22   True  False  False      True
    6    NaN  11  False  False  False     False
    7    2.0  34   True  False  False      True
    8    NaN  55  False  False  False     False
    9    3.0  67   True  False  False      True
    10   NaN  55  False  False  False     False
    11   6.0  66   True  False  False      True
    12   NaN  22  False  False  False     False
    13   NaN  11  False  False  False     False
    14   8.0  34   True  False  False      True
    15   9.0  33   True  False  False      True
    16  15.0  44   True  False  False      True
    17   NaN  55  False  False  False     False
    18  18.0   6   True  False  False      True
    19   NaN  96  False  False   True      True
    20   NaN  64  False  False   True      True
    21   NaN  93  False  False   True      True
    22   NaN  81  False  False   True      True