I have tranning set composed of 36 features. when I calculated "explained" value of PCA using Matlab. I notice that only the first 24 components are important.
my question is, would I gain a better accuracy (prediction) if I omit the reset of the components (the other 12 components). Or SVM is very resilient to noise which means that regardless whether I removed the other 12 components or not. performance will not change that much.
There is no general answer, it is impossible to ever say "what will happen to method X if I preprocess with Y". In general, however:
- preprocessing using heuristics is a bad idea (PCA is just a heuristic, there is no justification from supervised learning perspective to use it) - think about them when "pure" method fails, not before it fais
- the fact that PCA identifies dimensions as less important does not mean these are noise
- SVM ability to deal with noise depends on the noise strength and kernel used, for high-bias kernels such as linear or polynomial noise should not the the problem, for low-bias like RBF - it will affect classification, but again - real noise, your rescription does not fit definition of real noise.