Search code examples
pythonaggregatepca

PCA - taking difference with mean


What is the intuition behind number 1 and 2 when it comes to considering mean? And how will this affect performance and accuracy?

Number 1:

    pca = decomposition.PCA(n_components=4)
    X_centered = X - X.mean(axis=0)
    pca.fit(X_centered)
    X_pca = pca.transform(X_centered)

Number 2:

    pca = decomposition.PCA(n_components=4)
    pca.fit(X)
    X_pca = pca.transform(X)

Thanks in advance


Solution

  • It will be the same. In a way, PCA find a set of basis vectors, which are orthogonal to each and maximize the variance in a set of points projections onto them. PCA therefore has rotation and translation symmetry. Therefore you will have identical PCA results whenever you shift your matrix (which is what subtraction of the mean essentially does) to not.