machine-learning tensorflow scikit-learn sklearn-pandas imputation

Sklearn: Categorical Imputer?

Is there a way to impute categorical values using a sklearn.preprocessing object? I would like to ultimatly create a preprocessing object which I can apply to new data and have it transformed the same way as old data.

I am looking for a way to do it so that I can use it this way.

Solution

Copying and modifying this answer, I made an imputer for a pandas.Series object

import numpy
import pandas 

from sklearn.base import TransformerMixin


class SeriesImputer(TransformerMixin):

    def __init__(self):
        """Impute missing values.

        If the Series is of dtype Object, then impute with the most frequent object.
        If the Series is not of dtype Object, then impute with the mean.  

        """
    def fit(self, X, y=None):
        if   X.dtype == numpy.dtype('O'): self.fill = X.value_counts().index[0]
        else                            : self.fill = X.mean()
        return self

    def transform(self, X, y=None):
        return X.fillna(self.fill)

To use it you would do:

# Make a series
s1 = pandas.Series(['k', 'i', 't', 't', 'e', numpy.NaN])


a  = SeriesImputer()   # Initialize the imputer
a.fit(s1)              # Fit the imputer
s2 = a.transform(s1)   # Get a new series