I am doing a classification problem in biometrics. I am comparing with the euclidean distance each probe in the testing set with the gallery.
Everytime I run the code I get different results. If I remove the scaler I get always the same results.
Why does the scaler produce different values? (the difference is slightly, sometimes it recognizes 10 more probes, sometimes 10 less). Thanks to all who answer.
scaler = StandardScaler()
training_walks_matrix = load('training_imputeZero.npy')
training_scaled = scaler.fit_transform(training_walks_matrix)
testing_walks_matrix = load('testing_imputeZero.npy')
testing_scaled = scaler.transform(testing_walks_matrix)
pca = PCA(n_components=50).fit(training_scaled)
training_walks_matrix = pca.transform(training_scaled)
testing_walks_matrix = pca.transform(testing_scaled)
The only thing that I can suspect is that probably the arpack
or randomized
solvers are used behind the scene in your case since this is defined automatically. In that case, you need to fix the random seed in order to reproduce the results.
Try to fix the random seed by passing a value in the input argument random_state
of the PCA
instance.
myseed = 0
scaler = StandardScaler()
training_walks_matrix = load('training_imputeZero.npy')
training_scaled = scaler.fit_transform(training_walks_matrix)
testing_walks_matrix = load('testing_imputeZero.npy')
testing_scaled = scaler.transform(testing_walks_matrix)
#here
pca = PCA(n_components=50, random_state=myseed).fit(training_scaled)
training_walks_matrix = pca.transform(training_scaled)
testing_walks_matrix = pca.transform(testing_scaled)