I have a set of 70 input variables on which I need to perform PCA. As per my understanding centering data such that for each input variable mean is 0
and variance is 1
, is necessary for applying PCA.
I am having a hard time figuring it out that do I need to perform standard scaling preprocessing.StandardScaler()
before passing my data set to PCA
or PCA
function in sklearn does it on its own.
If latter is the case then irrespective of if I do, or do not apply preprocessing.StandardScaler()
the explained_variance_ratio_
should be the same.
But the results are different, hence I believe preprocessing.StandardScaler()
is necessary before applying PCA
. Is it true?
Yes, it' true, scikit-learn
's PCA does not apply standardization to the input dataset, it only centers it by subtracting the mean.
See also this post.