I would like to create a class that uses sklearn
transformation methods. I found this article and I am using it as an example.
from sklearn import preprocessing
from sklearn.base import TransformerMixin
def minmax(dataframe):
minmax_transformer = preprocessing.MinMaxScaler()
return minmax_tranformer
class FunctionFeaturizer(TransformerMixin):
def __init__(self, scaler):
self.scaler = scaler
def fit(self, X, y=None):
return self
def transform(self, X):
fv = self.scaler(X)
return fv
if __name__=="__main__":
scaling = FunctionFeaturizer(minmax)
df = pd.DataFrame({'feature': np.arange(10)})
df_scaled = scaling.fit(df).transform(df)
print(df_scaled)
The output is StandardScaler(copy=True, with_mean=True, with_std=True)
which is actually the result of the preprocessing.StandardScaler().fit(df)
if I use it out of the class.
What I am expecting is:
array([[0. ],
[0.11111111],
[0.22222222],
[0.33333333],
[0.44444444],
[0.55555556],
[0.66666667],
[0.77777778],
[0.88888889],
[1. ]])
I am feeling that I am mixing few things here but I do not know what.
Update I did some modifications:
def minmax():
return preprocessing.MinMaxScaler()
class FunctionFeaturizer(TransformerMixin):
def __init__(self, scaler):
self.scaler = scaler
def fit(self, X, y=None):
return self
def fit_transform(self, X):
self.scaler.fit(X)
return self.scaler.transform(X)
if __name__=="__main__":
scaling = FunctionFeaturizer(minmax)
df = pd.DataFrame({'feature': np.arange(10)})
df_scaled = scaling.fit_transform(df)
print(df_scaled)
But now I am receiving the following error:
Traceback (most recent call last):
File "C:/my_file.py", line 33, in <module>
test_scale = scaling.fit_transform(df)
File "C:/my_file.py", line 26, in fit_transform
self.scaler.fit(X)
AttributeError: 'function' object has no attribute 'fit'
in your code you have:
if __name__=="__main__":
scaling = FunctionFeaturizer(minmax)
df = pd.DataFrame({'feature': np.arange(10)})
df_scaled = scaling.fit_transform(df)
print(df_scaled)
change the line
scaling = FunctionFeaturizer(minmax)
to
scaling = FunctionFeaturizer(minmax())
you need to call the function to get the instantiation of MinMaxScaler returned to you.
Instead of implementing fit
and fit_transform
, implement fit
and transform
unless you can optimize both process into fit_tranform
. This way, it is clearer what you are doing.
If you implement only fit
and transform
, you can still call fit_transform
because you extend the TransformerMixin
class. It will just call both functions in a row.
Your transformer is looking at every column of your dataset and distributing the values linearly between 0
and 1
.
So, to get your expected results, it will really depend on what your df
looks like. However, you did not share that with us, so it is difficult to tell if you will get it.
However, if you have df = [[0],[1],[2],[3],[4],[5],[6],[7],[8],[9]]
, you will see your expected result.
if __name__=="__main__":
scaling = FunctionFeaturizer(minmax())
df = [[0], [1], [2], [3], [4], [5], [6], [7], [8], [9]]
df_scaled = scaling.fit_transform(df)
print(df_scaled)
> [[0. ]
> [0.11111111]
> [0.22222222]
> [0.33333333]
> [0.44444444]
> [0.55555556]
> [0.66666667]
> [0.77777778]
> [0.88888889]
> [1. ]]