Search code examples
pythonscikit-learnpca

Does sklearn PCA fit_transform() center input variables?


Question in the title. After calling pca.fit(X), suppose I called pca.fit_transform(new_X). Is new_X automatically centered by PCA? The documentation is unclear on this point.


Solution

  • From the docs:

    Linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space. The input data is centered but not scaled for each feature before applying the SVD.

    https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html

    fit_transform is just the equivalent of running fit and transform consecutively on the same input matrix. The fit function calculates the means for centering the data, and the transform function applies the mean centering using the means calculated during fit.

    Therefore to fit on one matrix, and apply the centering parameters learnt from that matrix to another (as, for example, when applying a model learnt on a training set to a test/validation set), you would need to use fit and transform separately.