Search code examples
pythonpandassklearn-pandasimputation

Using a Mask to Insert Values from sklearn Iterative Imputer


I created a set of random missing values to practice with a tree imputer. However, I'm stuck on how to overwrite the missing values into the my dataframe. My missing values look like this:

enter image description here

from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer

df_post_copy = df_post.copy()
missing_mask = df_post_copy.isna()
imputer = IterativeImputer(max_iter=10, random_state=0)
imputed_values = imputer.fit_transform(df_post_copy)
df_copy[missing_mask] = imputed_values[missing_mask]

Results in:

ValueError: other must be the same shape as self when an ndarray

But the shape matches...

imputed_values.shape
(16494, 29)

The type is:

type(imputed_values)
numpy.ndarray

What I have tried since it is the right shape is to convert it to a pandas dataframe:

test_imputed_values = pd.DataFrame(imputed_values)

When I try:

df_copy[missing_mask] = test_imputed_values[missing_mask]

I get the same as above:

enter image description here

How do I use a mask to insert the imputed values where needed?


Solution

  • imputer.fit_transform(...) returns both the original values and the (previously) missing values. If you want an updated DataFrame, something like

    imputed_values = imputer.fit_transform(df_post_copy)
    df_post_copy.loc[:, :] = imputed_values
    

    should work.