I am trying to use PCA to visualize my implementation of k-means algorithm. I am following the tutorial on Principal Component Coefficients, Scores, and Variances in this link.
I am using the following command: [coeff,score,~]=pca(X');
where X is my data.
My data is a 30 by 455 matrix, that is 30 features with 455 samples. I have successfully used the score parameter to create a 2D plot for visualization purposes. Now I wish to project the 30 dimensional center to that plain. I have tried coeff*centers(:,1)
but I do not understand if this is the correct usage.
How do I project a new 30 dimensional point to the 2D of the first vs the second pca components?
I assume that by centers(:, 1)
you denote a new observation. To express this observation in the principal components you should write
[coeff, score, ~, ~, ~, mu]=pca(X'); %return the estimated mean "mu"
tmp = centers(:, 1) - mu'; %remove mean since pca() by default centers data
coeff' * tmp; % the new observation expressed in the principal components
Note that you have to subtract the mean since pca()
by default centers the data. Also, note the transpose '
on coeff. In fact it should be inv(coeff)
, but since coeff
is an orthogonal matrix we can use transpose instead.