I am trying to write a lambda function in Pandas that checks to see if Col1 is a Nan and if so, uses another column's data. I am having trouble getting code (below) to compile/execute correctly.
import pandas as pd
import numpy as np
df = pd.DataFrame({'Col1': [1, 2, 3, np.NaN], 'Col2': [7, 8, 9, 10]})
df2 = df.apply(lambda x: x['Col2'] if x['Col1'].isnull() else x['Col1'], axis=1)
Does anyone have any good idea on how to write a solution like this with a lambda function or have I exceeded the abilities of lambda? If not, do you have another solution?
You need pandas.isnull
to check if a scalar is NaN
:
df = pd.DataFrame({'Col1': [1, 2, 3, np.NaN],
'Col2': [8, 9, 7, 10]})
df2 = df.apply(lambda x: x['Col2'] if pd.isnull(x['Col1']) else x['Col1'], axis=1)
print(df)
Col1 Col2
0 1.0 8
1 2.0 9
2 3.0 7
3 NaN 10
print(df2)
0 1.0
1 2.0
2 3.0
3 10.0
dtype: float64
But better is use Series.combine_first
:
df['Col1'] = df['Col1'].combine_first(df['Col2'])
print(df)
Col1 Col2
0 1.0 8
1 2.0 9
2 3.0 7
3 10.0 10
Another solution with Series.update
:
df['Col1'].update(df['Col2'])
print(df)
Col1 Col2
0 8.0 8
1 9.0 9
2 7.0 7
3 10.0 10