Search code examples
pythonpandaslambdanan

Pandas lambda function with Nan support


I am trying to write a lambda function in Pandas that checks to see if Col1 is a Nan and if so, uses another column's data. I am having trouble getting code (below) to compile/execute correctly.

import pandas as pd
import numpy as np

df = pd.DataFrame({'Col1': [1, 2, 3, np.NaN], 'Col2': [7, 8, 9, 10]})  
df2 = df.apply(lambda x: x['Col2'] if x['Col1'].isnull() else x['Col1'], axis=1)

Does anyone have any good idea on how to write a solution like this with a lambda function or have I exceeded the abilities of lambda? If not, do you have another solution?


Solution

  • You need pandas.isnull to check if a scalar is NaN:

    df = pd.DataFrame({'Col1': [1, 2, 3, np.NaN],
                       'Col2': [8, 9, 7, 10]})  
                     
    df2 = df.apply(lambda x: x['Col2'] if pd.isnull(x['Col1']) else x['Col1'], axis=1)
    
    print(df)
       Col1  Col2
    0   1.0     8
    1   2.0     9
    2   3.0     7
    3   NaN    10
    
    print(df2)
    0     1.0
    1     2.0
    2     3.0
    3    10.0
    dtype: float64
    

    But better is use Series.combine_first:

    df['Col1'] = df['Col1'].combine_first(df['Col2'])
    
    print(df)
       Col1  Col2
    0   1.0     8
    1   2.0     9
    2   3.0     7
    3  10.0    10
    

    Another solution with Series.update:

    df['Col1'].update(df['Col2'])
    print(df)
       Col1  Col2
    0   8.0     8
    1   9.0     9
    2   7.0     7
    3  10.0    10