Search code examples
python-3.xoopscikit-learnsklearn-pandas

Sklearn method in class


I would like to create a class that uses sklearn transformation methods. I found this article and I am using it as an example.

from sklearn import preprocessing
from sklearn.base import TransformerMixin

def minmax(dataframe):
  minmax_transformer = preprocessing.MinMaxScaler()
  return minmax_tranformer


class FunctionFeaturizer(TransformerMixin):
    def __init__(self, scaler):
        self.scaler = scaler

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        fv = self.scaler(X)
        return  fv

if __name__=="__main__":
     scaling = FunctionFeaturizer(minmax)
     df = pd.DataFrame({'feature': np.arange(10)})
     df_scaled = scaling.fit(df).transform(df)
     print(df_scaled)     

The output is StandardScaler(copy=True, with_mean=True, with_std=True) which is actually the result of the preprocessing.StandardScaler().fit(df) if I use it out of the class.

What I am expecting is:

array([[0.        ],
       [0.11111111],
       [0.22222222],
       [0.33333333],
       [0.44444444],
       [0.55555556],
       [0.66666667],
       [0.77777778],
       [0.88888889],
       [1.        ]])

I am feeling that I am mixing few things here but I do not know what.

Update I did some modifications:

def minmax():
    return preprocessing.MinMaxScaler()

class FunctionFeaturizer(TransformerMixin):
    def __init__(self, scaler):
        self.scaler = scaler

    def fit(self, X, y=None):
        return self

    def fit_transform(self, X):
        self.scaler.fit(X)
        return self.scaler.transform(X)

if __name__=="__main__":
    scaling = FunctionFeaturizer(minmax)
    df = pd.DataFrame({'feature': np.arange(10)})
    df_scaled = scaling.fit_transform(df)
    print(df_scaled)   

But now I am receiving the following error:

Traceback (most recent call last):
  File "C:/my_file.py", line 33, in <module>
    test_scale = scaling.fit_transform(df)
  File "C:/my_file.py", line 26, in fit_transform
    self.scaler.fit(X)
AttributeError: 'function' object has no attribute 'fit'

Solution

  • Solving your error

    in your code you have:

    if __name__=="__main__":
        scaling = FunctionFeaturizer(minmax)
        df = pd.DataFrame({'feature': np.arange(10)})
        df_scaled = scaling.fit_transform(df)
        print(df_scaled)
    

    change the line

    scaling = FunctionFeaturizer(minmax)
    

    to

    scaling = FunctionFeaturizer(minmax())
    

    you need to call the function to get the instantiation of MinMaxScaler returned to you.

    Suggestion

    Instead of implementing fit and fit_transform, implement fit and transform unless you can optimize both process into fit_tranform. This way, it is clearer what you are doing.

    If you implement only fit and transform, you can still call fit_transform because you extend the TransformerMixin class. It will just call both functions in a row.

    Getting your expected results

    Your transformer is looking at every column of your dataset and distributing the values linearly between 0 and 1.

    So, to get your expected results, it will really depend on what your df looks like. However, you did not share that with us, so it is difficult to tell if you will get it.

    However, if you have df = [[0],[1],[2],[3],[4],[5],[6],[7],[8],[9]], you will see your expected result.

    if __name__=="__main__":
        scaling = FunctionFeaturizer(minmax())
        df = [[0], [1], [2], [3], [4], [5], [6], [7], [8], [9]]
        df_scaled = scaling.fit_transform(df)
        print(df_scaled)
    
    > [[0.        ]
    >  [0.11111111]
    >  [0.22222222]
    >  [0.33333333]
    >  [0.44444444]
    >  [0.55555556]
    >  [0.66666667]
    >  [0.77777778]
    >  [0.88888889]
    >  [1.        ]]