Search code examples
algorithmmatlabnormalizationpca

Principal component analysis m-by-n matrix implementation


Does anyone know how to implement the Principal component analysis (PCA) on a m-by-n matrix in matlab for normalization?


Solution

  • Assuming each column is a sample (that is, you have n samples each of dimension m), and it's stored in a matrix A you first have to subtract off the column means:

          Amm = bsxfun(@minus,A,mean(A,2));
    

    then you want to do an eigenvalue decomposition on 1/size(Amm,2)*Amm*Amm' (you can use 1/(size(Amm,2)-1) as a scale factor if you want an interpetation as an unbiased covariance matrix) with:

          [v,d] = eig(1/size(Amm,2)*Amm*Amm');
    

    And the columns of v are going to be your PCA vectors. The entries of d are going to be your corresponding "variances".

    However, if your m is huge then this is not the best way to go because storing Amm*Amm' is not practical. You want to instead compute:

          [u,s,v] = svd(1/sqrt(size(Amm,2))*Amm,'econ');
    

    This time u contains your PCA vectors. The entries of s are related to the entries of d by a sqrt.

    Note: there's another way to go if m is huge, i.e. computing eig(1/size(Amm,2)*Amm '*Amm); (notice the switch of transposes as compared to above) and doing a little trickery, but it's a longer explanation so I won't get into it.