Search code examples
machine-learningprojectionpca

Why to add mean in the reconstruction in PCA?


Suppose that X is our dataset (still not centered) and X_cent is our centered dataset (X_cent = X - mean(X)).

If we are doing PCA projection in this way Z_cent = F*X_cent, where F is matrix of principal components, that is pretty obvious that we need to add mean(X) after reconstruction Z_cent.

But what if we are doing PCA projection in this way Z = F*X? In this case we don't need to add mean(X) after reconstruction, but it gives us another result.

I think that something wrong with this procedure (construction-reconstruction), when it applied to the non-centered data (X in our case). Can anyone explain how it works? Why can't we do construction/reconstruction phase without this subracting/adding mean?

Thank you in advance.


Solution

  • If you retain all Principal Components, then reconstruction of the centered and non-centered vectors as described in your question would be identical. The problem (as indicated in your comments) is that you are only retaining K principal components. When you drop PCs, you lose information so the reconstruction will contain errors. Since you don't have to reconstruct the mean in one of the reconstructions, you don't introduce errors w.r.t. the mean there so the reconstruction errors of the two versions will be different.

    Reconstruction with fewer than all PCs isn't quite as simple as multiplying by the transpose of the eigenvectors (F') because you need to pad your transformed data with zeros but to keep things simple, I'll ignore that here. Your two reconstructions look like this:

    R1 = F'*F*X
    R2 = F'*F*X_cent + X_mean
       = F'*F*(X - X_mean) + X_mean
       = F'*F*X - F'*F*X_mean + X_mean
    

    Since the reconstruction is lossy, in general, F'*F*Y != Y for matrix Y. If you retrained all PCs, you would have R1 - R2 = 0. But since you are only retaining a subset of the PCs, your two reconstructions will differ by

    R2 - R1 = X_mean - F'*F*X_mean
    

    Your follow-up question in the comments regarding why it's better to reconstruct X_cent instead of X is a bit more nuanced and really depends on why you are doing PCA in the first place. The most fundamental reason is that the PCs are with respect to the mean in the first place so by not centering the data prior to transforming/rotating, you aren't really decorrelating the features. Another reason is that the numeric values of the transformed data are often orders of magnitude smaller when centering the data first.