Search code examples
pythonpandasdataframepandas-apply

Python pandas dataframe apply result of function to multiple columns where NaN


I have a dataframe with three columns and a function that calculates the values of column y and z given the value of column x. I need to only calculate the values if they are missing NaN.

def calculate(x):
    return 1, 2

df = pd.DataFrame({'x':['a', 'b', 'c', 'd', 'e', 'f'], 'y':[np.NaN, np.NaN, np.NaN, 'a1', 'b2', 'c3'], 'z':[np.NaN, np.NaN, np.NaN, 'a2', 'b1', 'c4']})

 x    y    z
0  a  NaN  NaN
1  b  NaN  NaN
2  c  NaN  NaN
3  d   a1   a2
4  e   b2   b1
5  f   c3   c4

mask = (df.isnull().any(axis=1))

df[['y', 'z']] = df[mask].apply(calculate, axis=1, result_type='expand')

However, I get the following result, although I only apply to the masked set. Unsure what I'm doing wrong.

    x   y   z
0   a   1.0 2.0
1   b   1.0 2.0
2   c   1.0 2.0
3   d   NaN NaN
4   e   NaN NaN
5   f   NaN NaN

If the mask is inverted I get the following result:

df[['y', 'z']] = df[~mask].apply(calculate, axis=1, result_type='expand')
    x   y   z
0   a   NaN NaN
1   b   NaN NaN
2   c   NaN NaN
3   d   1.0 2.0
4   e   1.0 2.0
5   f   1.0 2.0

Expected result:

   x    y    z
0  a  1.0   2.0
1  b  1.0   2.0
2  c  1.0   2.0
3  d   a1   a2
4  e   b2   b1
5  f   c3   c4

Solution

  • you can fillna after calculating for the full dataframe and set_axis

    out = (df.fillna(df.apply(calculate, axis=1, result_type='expand')
                           .set_axis(['y','z'],inplace=False,axis=1)))
    

    print(out)
    
       x   y   z
    0  a   1   2
    1  b   1   2
    2  c   1   2
    3  d  a1  a2
    4  e  b2  b1
    5  f  c3  c4