Search code examples
pythonfillna

I cannot get Fillna in Python to Work when using Mode to Replace NaNs with Most Frequent Column String Value


Strange problem.

I have a dtype == object dataframe column with string values and NaNs. Looks like this:

df   
     Response    
0    Email
1    NaN
2    NaN
3    Call
4    Email
5    Email

I want to use fillna to fill the NaN values with the most frequently occurring value - which in this case is 'email'.

code looks like this:

import numpy as np
import pandas as pd

most_frequent_cat = str(df['Response']).mode())
df['Response_imputed'] = df['Response']
df['Response_imputed'].fillna(most_freq_cat, inplace = True)

The results look like this:

df   Response    

0    Email
1    0    Email\ndtype: object
2    0    Email\ndtype: object
3    Call
4    Email
5    Email

0 Email\ndtype: object is different than Email

If I remove the str there is no replacement of the original NaNs

What am I doing wrong?


Solution

  • Don't use DataFrame.fillna with inplace=True. Actually I would recommend forgetting that argument exists entirely. Use Series.fillna instead since you only need this on one column and assign the result back.

    Another thing to note is mode can return multiple modes if there is no single mode. In that case it should suffice to either select the first one, or one at random (an exercise for you).

    Here's my recommended syntax:

    # call fillna on the column and assign it back
    df['Response'] = df['Response'].fillna(df['Response'].mode().iat[0])
    df
     
      Response
    0    Email
    1    Email
    2    Email
    3     Call
    4    Email
    5    Email
    

    You can also do a per column fill if you have multiple columns to fill NaNs for. Again the procedure is similar, call mode on your columns, then get the first mode for each column and use it as an argument to DataFeame.fillna this time:

    df.fillna(df.mode().iloc[0])
    
      Response
    0    Email
    1    Email
    2    Email
    3     Call
    4    Email
    5    Email