Search code examples
pythonpandasdataframenumpylogical-operators

Python Dataframe avoid non-NaN values dropping during <> operations


My code:

xdf = pd.DataFrame(data={'A':[-10,np.nan,-2.2],'B':[np.nan,2,1.5],'C':[3,1,-0.3]},index=['2023-05-13 08:40:00','2023-05-13 08:41:00','2023-05-13 08:42:00'])
xdf = 
                          A      B      C
2023-05-13 08:40:00     -10.0   NaN     3.0
2023-05-13 08:41:00     NaN     2.0     1.0
2023-05-13 08:42:00     -2.2    1.5     -0.3

Consider only values below 4.0 and above -4.0 in each row of the dataframe

print(xdf[((xdf<4.0).all(axis=1))&((xdf>-4.0).all(axis=1))])

Present output:

                          A      B      C
2023-05-13 08:42:00     -2.2    1.5     -0.3

Expected output: My above code drops a row if there is a NaN in one column, despite other columns satisfying the filter condition. So, I want to omit NaN columns and consider non-NaN columns in <> operation.

                          A      B      C
2023-05-13 08:41:00     NaN     2.0     1.0
2023-05-13 08:42:00     -2.2    1.5     -0.3

Edit:

One working solution:

print(xdf[((xdf.fillna(True)<4.0).all(axis=1))&((xdf.fillna(True)>-4.0).all(axis=1))])

Solution

  • I suggest add new mask DataFrame.isna chained with | (bitwise OR) for test missing values:

    print(xdf[((xdf<4.0) & (xdf>-4.0) | xdf.isna()).all(axis=1)])
                           A    B    C
    2023-05-13 08:41:00  NaN  2.0  1.0
    2023-05-13 08:42:00 -2.2  1.5 -0.3
    

    How it working:

    print ((xdf<4.0) & (xdf>-4.0))
                             A      B     C
    2023-05-13 08:40:00  False  False  True
    2023-05-13 08:41:00  False   True  True
    2023-05-13 08:42:00   True   True  True
    
    print (((xdf<4.0) & (xdf>-4.0) | xdf.isna()))
                             A     B     C
    2023-05-13 08:40:00  False  True  True
    2023-05-13 08:41:00   True  True  True
    2023-05-13 08:42:00   True  True  True
    
    print(((xdf<4.0) & (xdf>-4.0) | xdf.isna()).all(axis=1))
    2023-05-13 08:40:00    False
    2023-05-13 08:41:00     True
    2023-05-13 08:42:00     True
    dtype: bool
    

    Another idea is use DataFrame.lt and DataFrame.gt :

    print(xdf[(xdf.lt(4.0) & xdf.gt(-4.0) | xdf.isna()).all(axis=1)])
                           A    B    C
    2023-05-13 08:41:00  NaN  2.0  1.0
    2023-05-13 08:42:00 -2.2  1.5 -0.3