I'm new in pca and after some researching I found that with pca algorithm we can select best effective features.
I just wanted to use pca function (in MATLAB) to select best features to classification data to two classes with labels "health" and "unhealthy" (supervised classification).
My question is that should I set some parameters on this function to do it or I should write codes by myself and pca function does not has this compatibility?.
As an example, I have a data set with 200 rows and 5 features that are:
1-Age
2-Weight
3-Tall
4-Skin Color
5-Eye color
and want to use "pca" function to find effective features (as an example):
1-Age
3-Tall
5-Eye Color
to classification data (2 classes with labels "health" and "unhealthy").
% remove labels
features=AllMyData(:,1:end-1);
% get dimensions
[m,n] = size(features);
%# Remove the mean
features = features - repmat(mean(features,2), 1, size(features,2));
%# Compute the SVD
[U,S,V] = svd(features);
%# Compute the number of eigenvectors representing
%# the 95% of the variation
coverage = cumsum(diag(S));
coverage = coverage ./ max(coverage);
[~, nEig] = max(coverage > 0.95);
%# Compute the norms of each vector in the new space
norms = zeros(n,1);
for i = 1:n
norms(i) = norm(V(i,1:nEig))^2;
end
[~, idx] = sort(norms);
idx(1:n)'