Search code examples
pandasimputation

Pandas: Imputing Missing Values to Data Frame


Suppose I have a data frame with some missing values, as below:

import pandas as pd

df = pd.DataFrame([[1,3,'NA',2], [0,1,1,3], [1,2,'NA',1]], columns=['W', 'X', 'Y', 'Z'])
print(df)

The variable Y is missing two values. Say I run some imputation model and come up with an estimate of what the two values should be:

to_impute = [2,1]

What is the best way of replacing the two NA's with those two values? I know of ways that are fairly roundabout, e.g. looping over to_impute and using df.iloc to add each value. But I'm hoping there is a concise and non-iterative way.

(This is something that is easy in R, and I'm hoping it can be easy in Pandas.)


Solution

  • In pandas NA should be NaN, 1st you need to replace it , then we can using fillna

    df.Y=df.Y.replace('NA',np.nan)
    df.Y=df.Y.fillna(pd.Series([1,2],index=df.index[df.Y.isnull()]))
    df
    Out[1375]: 
       W  X    Y  Z
    0  1  3  1.0  2
    1  0  1  1.0  3
    2  1  2  2.0  1
    

    Let us treat your NA as str

    df.loc[df.Y=='NA','Y']=[1,2]
    df
    Out[1380]: 
       W  X  Y  Z
    0  1  3  1  2
    1  0  1  1  3
    2  1  2  2  1