Search code examples
normalizationmissing-datadata-cleaningfillna

problem with missing value. Does not work for every missing value?


I want my missing values to be replaced by the mode of given data. But my code is replacing only one of the missing values. Why?

my real data is:

0         NaN
1         NaN
2      normal
3      normal
4      normal
        ...  
395    normal
396    normal
397    normal
398    normal
399    normal
Name: rbc, Length: 400, dtype: object

my code is:

rbc = data_penyakit['rbc'].mode()
rbc = data_penyakit['rbc'].mask(pd.isna, rbc)
rbc

and the result is

0      normal
1         NaN
2      normal
3      normal
4      normal
        ...  
395    normal
396    normal
397    normal
398    normal
399    normal
Name: rbc, Length: 400, dtype: object

Why is the second missing value not replaced?


Solution

  • mode is giving back nan as the second most frequent item. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.mode.html

    So how about

    fill = data_penyakit['rbc'].mode().iloc[0]
    rbc.fillna(value=fill, inplace=True)