Search code examples
pythonscikit-learndata-sciencepipelinecross-validation

How to create a scikit-learn pipeline that applies z-score and cross-validation?


I am trying to normalize my data at each step of the cross-validation and I came across this question

As suggested, I went to the scikit-learn documentation and found this example:

from sklearn.pipeline import make_pipeline
clf = make_pipeline(preprocessing.StandardScaler(), svm.SVC(C=1))
cross_val_score(clf, X, y, cv=cv)

This looks indeed like what I am trying to achieve, however, my intention is to use a z-scorer instead of the StandardScaler, so I tried this:

clf = make_pipeline(stats.zscore(), DecisionTreeClassifier())

But I get an error saying this:

TypeError: zscore() missing 1 required positional argument: 'a'

What should be the argument of zscore()?


Solution

  • Welcome to Stack Overflow! There are several ways of using custom functionality in sklearn pipelines — I think FunctionTransformer could fit your case.

    Create a transformer that uses zscore and pass the transformer to make_pipeline instead of calling zscore directly.

    I hope this helps!