Search code examples
pythonpandasdataframenumpyfillna

problem in pandas 'apply' function to replace missing values


I want to replace np.nan values with other value in pandas.DataFrame using 'apply' function. And I will use replace method that where NaN is replaced with max value of each column (axis=0). You better understand below.

import pandas as pd

df = pd.DataFrame({'a':[1, np.nan, 3],
                  'b':[np.nan,5,6],
                  'c':[7,8,np.nan]})

result = df.apply(lambda c: c.replace(np.nan, max(c)), axis=0)
print(result)

There are three np.nan values. Two of them is replaced with appropriate values, but just one value is still np.nan(below picture)

enter image description here

After setting argument axis to 1, there is still one value that isn't replaced. What's the reason?


Solution

  • Python's max doesn't work if a list starts with NaN; so max(df['b'])returns NaN and it cannot fill the NaN value in that column. Use c.max() instead (which works because by default Series.max skips NaNs). So:

    df = df.apply(lambda c: c.replace(np.nan, c.max()), axis=0)
    

    But instead of replace, you could use fillna on axis:

    df = df.fillna(df.max(), axis=0)
    

    Output:

         a    b    c
    0  1.0  6.0  7.0
    1  3.0  5.0  8.0
    2  3.0  6.0  8.0