According to libsvm faqs, the following one-line code scale each feature to the range of [0,1] in Matlab
(data - repmat(min(data,[],1),size(data,1),1))*spdiags(1./(max(data,[],1)-min(data,[],1))',0,size(data,2),size(data,2))
so I'm using this code:
v_feature_trainN=(v_feature_train - repmat(mini,size(v_feature_train,1),1))*spdiags(1./(maxi-mini)',0,size(v_feature_train,2),size(v_feature_train,2));
v_feature_testN=(v_feature_test - repmat(mini,size(v_feature_test,1),1))*spdiags(1./(maxi-mini)',0,size(v_feature_test,2),size(v_feature_test,2));
where I use the first one to train the classifier and the second one to classify...
In my humble opinion scaling should be performed by:
i.e.:
v_feature_trainN2=(v_feature_train -min(v_feature_train(:)))./(max(v_feature_train(:))-min((v_feature_train(:))));
v_feature_test_N2=(v_feature_test -min(v_feature_train(:)))./(max(v_feature_train(:))-min((v_feature_train(:))));
Now I compared the classification results using these two scaling methods and the first one outperforms the second one. The question are: 1) What exactly does the first method? I didn't understand it. 2) Why the code suggested by libsvm outperforms the second one (e.g. 80% vs 60%)? Thank you so much in advance
First of all:
The code described in the libsvm
does something different than your code:
It maps every column independently onto the interval [0,1]
.
Your code however uses the global min
and max
to map all the columns using the same affine transformation instead of a separate transformation for each column.
The first code works in the following way:
(data - repmat(min(data,[],1),size(data,1),1))
This subtracts each column's minimum from the entire column. It does this by computing the row vector of minima min(data,[],1)
which is then replicated to build a matrix the same size as data
. Then it is subtracted from data
.
spdiags(1./(max(data,[],1)-min(data,[],1))',0,size(data,2),size(data,2))
This generates a diagonal matrix. The entry (i,i)
of this matrix is 1 divided by the difference of the maximum and the minimum of the i
th column: max(data(:,i))-min(data(:,i))
.
The right multiplication of this diagonal matrix means: Multiply each column of the left matrix with the corresponding diagonal entry. This effectively divides column i
by max(data(:,i))-min(data(:,i))
.
Instead of using a sparse diagonal matrix, you could do this even more efficiently with bsxfun
:
bsxfun(@rdivide, ...
bsxfun(@minus, ...
data, min(data,[],1)), ...
max(data,[],1)-min(data,[],1))
Which is the matlab way of writing:
max
and min
.