Search code examples
matlabscalinglibsvm

spdiags and features scaling


According to libsvm faqs, the following one-line code scale each feature to the range of [0,1] in Matlab

(data - repmat(min(data,[],1),size(data,1),1))*spdiags(1./(max(data,[],1)-min(data,[],1))',0,size(data,2),size(data,2))

so I'm using this code:

v_feature_trainN=(v_feature_train - repmat(mini,size(v_feature_train,1),1))*spdiags(1./(maxi-mini)',0,size(v_feature_train,2),size(v_feature_train,2));
 v_feature_testN=(v_feature_test - repmat(mini,size(v_feature_test,1),1))*spdiags(1./(maxi-mini)',0,size(v_feature_test,2),size(v_feature_test,2));

where I use the first one to train the classifier and the second one to classify...

In my humble opinion scaling should be performed by:

enter image description here

i.e.:

v_feature_trainN2=(v_feature_train -min(v_feature_train(:)))./(max(v_feature_train(:))-min((v_feature_train(:))));
v_feature_test_N2=(v_feature_test  -min(v_feature_train(:)))./(max(v_feature_train(:))-min((v_feature_train(:))));

Now I compared the classification results using these two scaling methods and the first one outperforms the second one. The question are: 1) What exactly does the first method? I didn't understand it. 2) Why the code suggested by libsvm outperforms the second one (e.g. 80% vs 60%)? Thank you so much in advance


Solution

  • First of all: The code described in the libsvm does something different than your code:

    It maps every column independently onto the interval [0,1]. Your code however uses the global min and max to map all the columns using the same affine transformation instead of a separate transformation for each column.


    The first code works in the following way:

    • (data - repmat(min(data,[],1),size(data,1),1))
      This subtracts each column's minimum from the entire column. It does this by computing the row vector of minima min(data,[],1) which is then replicated to build a matrix the same size as data. Then it is subtracted from data.

    • spdiags(1./(max(data,[],1)-min(data,[],1))',0,size(data,2),size(data,2))
      This generates a diagonal matrix. The entry (i,i) of this matrix is 1 divided by the difference of the maximum and the minimum of the ith column: max(data(:,i))-min(data(:,i)).

    • The right multiplication of this diagonal matrix means: Multiply each column of the left matrix with the corresponding diagonal entry. This effectively divides column i by max(data(:,i))-min(data(:,i)).


    Instead of using a sparse diagonal matrix, you could do this even more efficiently with bsxfun:

    bsxfun(@rdivide, ...
           bsxfun(@minus, ...
                  data, min(data,[],1)), ...
           max(data,[],1)-min(data,[],1))
    

    Which is the matlab way of writing:

    • Divide:
      • The difference of:
        • each column and its respective minimum
      • by the difference of each column's max and min.