I'm using the scikit-learn "permutation_test_score" method to evaluate my estimator performances significance. Unfortunately, I cannot understand from the scikit-learn documentation if the method implements any scaling on data. I use to standardise my data through a StandardScaler, to apply the training set standardisation to the testing set.
Here is an example from the documentation:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import permutation_test_score
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
y = iris.target
n_classes = np.unique(y).size
# Some noisy data not correlated
random = np.random.RandomState(seed=0)
E = random.normal(size=(len(X), 2200))
# Add noisy data to the informative features for make the task harder
X = np.c_[X, E]
svm = SVC(kernel='linear')
cv = StratifiedKFold(2)
score, permutation_scores, pvalue = permutation_test_score(
svm, X, y, scoring="accuracy", cv=cv, n_permutations=100, n_jobs=1)
permutation_test_score
a pipeline
where you apply the scaling.Example:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
pipe = Pipeline([('scaler', StandardScaler()), ('clf', SVC(kernel='linear'))])
score, permutation_scores, pvalue = permutation_test_score(
pipe, X, y, scoring="accuracy", cv=cv, n_permutations=100, n_jobs=1)