machine-learning computer-vision svm libsvm pattern-recognition

non linear svm kernel dimension

I have some problems with understanding the kernels for non-linear SVM. First what I understood by non-linear SVM is: using kernels the input is transformed to a very high dimension space where the transformed input can be separated by a linear hyper-plane.

Kernel for e.g: RBF:

         K(x_i, x_j) = exp(-||x_i - x_j||^2/(2*sigma^2));

where x_i and x_j are two inputs. here we need to change the sigma to adapt to our problem.

       (1) Say if my input dimension is d, what will be the dimension of the 
           transformed space?

       (2) If the transformed space has a dimension of more than 10000 is it 
           effective to use a linear SVM there to separate the inputs?

Solution

Well it is not only a matter of increasing the dimension. That's the general mechanism but not the whole idea, if it were true that the only goal of the kernel mapping is to increase the dimension, one could conclude that all kernels functions are equivalent and they are not.

The way how the mapping is made would make possible a linear separation in the new space. Talking about your example and just to extend a bit what greeness said, RBF kernel would order the feature space in terms of hyperspheres where an input vector would need to be close to an existing sphere in order to produce an activation.

So to answer directly your questions:

1) Note that you don't work on feature space directly. Instead, the optimization problem is solved using the inner product of the vectors in the feature space, so computationally you won't increase the dimension of the vectors.

2) It would depend on the nature of your data, having a high dimensional pattern would somehow help you to prevent overfitting but not necessarily will be linearly separable. Again, the linear separability in the new space would be achieved because the way the map is made and not only because it is in a higher dimension. In that sense, RBF would help but keep in mind that it might not perform well on generalization if your data is not locally enclosed.