Search code examples
pca

PCA forth and back mean subtraction/addition


Bottom line question: When I use my computed PCA projection matrix P to project a vector v into the other space (possibly lower in terms of dimensions), should I first subtract from v the mean of the vectors that were used for creating the covariance matrix that its principal eigenvectors form the projection matrix P ?

Another derived question: If the answer for the upper question is "correct", then when I project a "reduced" vector back to the original space, should I finally add to it the same mean?

Now the detailed question, including the steps that may cause a confusion:

PCA flow goes in the following way:

  1. Taking m vectors of length d and computing their covariance matrix. Since the element in the (i,j) position is the covariance of the i'th dimension and the j'th dimension along all the m vectors, we can get the target (dxd) sized covariance matrix by subtracting the mean from all the vectors, creating a matrix A of size (dxm), in which all the mean-subtracted vectors are placed as column vectors and computing the multiplication: C = AA'.

  2. Computing the d eigenvalues and eigenvectors of C, and for some pre-selected k, creating a matrix P of size (kxd), and placing the k eigenvectors corresponding to the largest eigenvalues in descending order, as row vectors of P.

  3. For any vector v of the original dimension d, that we want to project to the possibly reduced dimension k, we compute the multiplication: u = Pv, which produces a vector in the possibly reduced dimension k.

  4. For any vector u that was already projected to the possibly reduced dimension k, if we want to project it back (after possible loss of data) to the original dimension d, we compute the multiplication: v = P'u, which produces a vector in the original dimension d.

The question is whether:

  • in step (3), we should first subtract from v the mean we computed in step (1)?
  • in step (4), we should finally add to v the mean we computed in step (1)?

Solution

  • Found the answer in some tutorials. Sharing it here so anyone can enjoy...

    According to this very nice and friendly PCA tutorial, the mean should indeed be subtracted in (3) and added in (4). It also appears to be described in the same way in the very famous and classic eigenfaces paper.

    Here are some screenshots from the PCA tutorial, that make sense:

    Projecting to new space: Projecting to new space

    Projecting back to original space: Projecting back to original space