Strange problem.
I have a dtype == object dataframe column with string values and NaNs. Looks like this:
df
Response
0 Email
1 NaN
2 NaN
3 Call
4 Email
5 Email
I want to use fillna to fill the NaN values with the most frequently occurring value - which in this case is 'email'.
code looks like this:
import numpy as np
import pandas as pd
most_frequent_cat = str(df['Response']).mode())
df['Response_imputed'] = df['Response']
df['Response_imputed'].fillna(most_freq_cat, inplace = True)
The results look like this:
df Response
0 Email
1 0 Email\ndtype: object
2 0 Email\ndtype: object
3 Call
4 Email
5 Email
0 Email\ndtype: object
is different than Email
If I remove the str
there is no replacement of the original NaN
s
What am I doing wrong?
Don't use DataFrame.fillna
with inplace=True
. Actually I would recommend forgetting that argument exists entirely. Use Series.fillna
instead since you only need this on one column and assign the result back.
Another thing to note is mode
can return multiple modes if there is no single mode. In that case it should suffice to either select the first one, or one at random (an exercise for you).
Here's my recommended syntax:
# call fillna on the column and assign it back
df['Response'] = df['Response'].fillna(df['Response'].mode().iat[0])
df
Response
0 Email
1 Email
2 Email
3 Call
4 Email
5 Email
You can also do a per column fill if you have multiple columns to fill NaNs for. Again the procedure is similar, call mode on your columns, then get the first mode for each column and use it as an argument to DataFeame.fillna
this time:
df.fillna(df.mode().iloc[0])
Response
0 Email
1 Email
2 Email
3 Call
4 Email
5 Email