Search code examples
machine-learningcomputer-visionsvmlibsvm

svm scaling input values


I am using libSVM. Say my feature values are in the following format:

                         instance1 : f11, f12, f13, f14
                         instance2 : f21, f22, f23, f24
                         instance3 : f31, f32, f33, f34
                         instance4 : f41, f42, f43, f44
                         ..............................
                         instanceN : fN1, fN2, fN3, fN4

I think there are two scaling can be applied.

  1. scale each instance vector such that each vector has zero mean and unit variance.

        ( (f11, f12, f13, f14) - mean((f11, f12, f13, f14) ). /std((f11, f12, f13, f14) )
    
  2. scale each colum of the above matrix to a range. for example [-1, 1]

According to my experiments with RBF kernel (libSVM) I found that the second scaling (2) improves the results by about 10%. I did not understand the reason why (2) gives me a improved results.

Could anybody explain me what is the reason for applying scaling and why the second option gives me improved results?


Solution

  • The standard thing to do is to make each dimension (or attribute, or column (in your example)) have zero mean and unit variance.

    This brings each dimension of the SVM into the same magnitude. From http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf:

    The main advantage of scaling is to avoid attributes in greater numeric ranges dominating those in smaller numeric ranges. Another advantage is to avoid numerical diculties during the calculation. Because kernel values usually depend on the inner products of feature vectors, e.g. the linear kernel and the polynomial ker- nel, large attribute values might cause numerical problems. We recommend linearly scaling each attribute to the range [-1,+1] or [0,1].