I have a data set in which there is a column known as 'Native Country' which contain around 30000 records. Some are missing represented by NaN
so I thought to fill it with mode()
value. I wrote something like this:
data['Native Country'].fillna(data['Native Country'].mode(), inplace=True)
However when I do a count of missing values:
for col_name in data.columns:
print ("column:",col_name,".Missing:",sum(data[col_name].isnull()))
It is still coming up with the same number of NaN
values for the column Native Country.
Just call first element of series:
data['Native Country'].fillna(data['Native Country'].mode()[0], inplace=True)
or you can do the same with assisgnment:
data['Native Country'] = data['Native Country'].fillna(data['Native Country'].mode()[0])