Search code examples
matlabpcaprincomp

Principal Componenet Analysis (mathworks example source code )


My question is quite elementary but I need help understanding the basic concepts. In the following example from the Mathworks documentation page of the princomp function

load hald;
[pc,score,latent,tsquare] = princomp(ingredients);
pc,latent

we get the following values for:

pc =

   -0.0678   -0.6460    0.5673    0.5062
   -0.6785   -0.0200   -0.5440    0.4933
    0.0290    0.7553    0.4036    0.5156
    0.7309   -0.1085   -0.4684    0.4844

latent =

  517.7969
   67.4964
   12.4054
    0.2372

score =

   36.8218   -6.8709   -4.5909    0.3967
   29.6073    4.6109   -2.2476   -0.3958
  -12.9818   -4.2049    0.9022   -1.1261
   23.7147   -6.6341    1.8547   -0.3786
   -0.5532   -4.4617   -6.0874    0.1424
  -10.8125   -3.6466    0.9130   -0.1350
  -32.5882    8.9798   -1.6063    0.0818
   22.6064   10.7259    3.2365    0.3243
   -9.2626    8.9854   -0.0169   -0.5437
   -3.2840  -14.1573    7.0465    0.3405
    9.2200   12.3861    3.4283    0.4352
  -25.5849   -2.7817   -0.3867    0.4468
  -26.9032   -2.9310   -2.4455    0.4116

Legend:

latent is a vector containing the eigenvalues of the covariance matrix of X.

pc is a p-by-p matrix, each column containing coefficients for one principal component. The columns are in order of decreasing component variance.**

score is the principal component scores; that is, the representation of X in the principal component space. Rows of SCORE correspond to observations, columns to components.

Can somebody explain whether the values of score are genetrated somehow using the values of pc and if this true, what kind of computation is perfomed ?


Solution

  • Yes, it holds that score = norm_ingredients * pc, where norm_ingredients is the normalized version of your input matrix so that its columns have zero mean, that is,

    norm_ingredients = ingredients - repmat(mean(ingredients), size(ingredients, 1), 1)