Suppose I have a data frame with some missing values, as below:
import pandas as pd
df = pd.DataFrame([[1,3,'NA',2], [0,1,1,3], [1,2,'NA',1]], columns=['W', 'X', 'Y', 'Z'])
print(df)
The variable Y is missing two values. Say I run some imputation model and come up with an estimate of what the two values should be:
to_impute = [2,1]
What is the best way of replacing the two NA's with those two values? I know of ways that are fairly roundabout, e.g. looping over to_impute and using df.iloc to add each value. But I'm hoping there is a concise and non-iterative way.
(This is something that is easy in R, and I'm hoping it can be easy in Pandas.)
In pandas NA should be NaN, 1st you need to replace
it , then we can using fillna
df.Y=df.Y.replace('NA',np.nan)
df.Y=df.Y.fillna(pd.Series([1,2],index=df.index[df.Y.isnull()]))
df
Out[1375]:
W X Y Z
0 1 3 1.0 2
1 0 1 1.0 3
2 1 2 2.0 1
Let us treat your NA as str
df.loc[df.Y=='NA','Y']=[1,2]
df
Out[1380]:
W X Y Z
0 1 3 1 2
1 0 1 1 3
2 1 2 2 1