Search code examples
pandaspandas-apply

Pandas, apply simple function to NaN returns value instead of NaN?


import pandas as pd
import numpy as np

pd.DataFrame(
  {'a':[0,1,2,3],
   'b':[np.nan, np.nan, np.nan,3]}
).apply(lambda x: x> 1)

returns me False for the column b, whereas I would like to get NaN?

    a       b
0   False   False
1   False   False
2   True    False
3   True    True

Expected

    a       b
0   False   NaN
1   False   NaN
2   True    NaN
3   True    True

I'd really like my arithmetics to keep track of where I had data and where not, how might I achieve that?


Solution

  • Use DataFrame.mask or DataFrame.where with DataFrame.isna or DataFrame.notna:

    df = df.apply(lambda x: x> 1).mask(df.isna())
    #df = df.apply(lambda x: x> 1).where(df.notna())
    

    For better performance avoid apply:

    df = (df > 1).mask(df.isna())
    #df = (df > 1).where(df.notna())
    
    print (df)
           a    b
    0  False  NaN
    1  False  NaN
    2   True  NaN
    3   True  1.0
    

    Last use nullable boolean:

    df = df.astype('boolean')
    print (df)
           a     b
    0  False  <NA>
    1  False  <NA>
    2   True  <NA>
    3   True  True