Is there a way to impute categorical values using a sklearn.preprocessing object? I would like to ultimatly create a preprocessing object which I can apply to new data and have it transformed the same way as old data.
I am looking for a way to do it so that I can use it this way.
Copying and modifying this answer, I made an imputer for a pandas.Series object
import numpy
import pandas
from sklearn.base import TransformerMixin
class SeriesImputer(TransformerMixin):
def __init__(self):
"""Impute missing values.
If the Series is of dtype Object, then impute with the most frequent object.
If the Series is not of dtype Object, then impute with the mean.
"""
def fit(self, X, y=None):
if X.dtype == numpy.dtype('O'): self.fill = X.value_counts().index[0]
else : self.fill = X.mean()
return self
def transform(self, X, y=None):
return X.fillna(self.fill)
To use it you would do:
# Make a series
s1 = pandas.Series(['k', 'i', 't', 't', 'e', numpy.NaN])
a = SeriesImputer() # Initialize the imputer
a.fit(s1) # Fit the imputer
s2 = a.transform(s1) # Get a new series