Search code examples
scikit-learncenteringpca

Is standardized scaling a pre-requisite for applying PCA using sklearn?


I have a set of 70 input variables on which I need to perform PCA. As per my understanding centering data such that for each input variable mean is 0 and variance is 1, is necessary for applying PCA.

I am having a hard time figuring it out that do I need to perform standard scaling preprocessing.StandardScaler()before passing my data set to PCA or PCA function in sklearn does it on its own.

If latter is the case then irrespective of if I do, or do not apply preprocessing.StandardScaler() the explained_variance_ratio_ should be the same.

But the results are different, hence I believe preprocessing.StandardScaler() is necessary before applying PCA. Is it true?


Solution

  • Yes, it' true, scikit-learn's PCA does not apply standardization to the input dataset, it only centers it by subtracting the mean.

    See also this post.