How LIBSVM works performs multivariate regression is my generalized question? In detail, I have some data for certain number of links. (Example 3 links). Each link has 3 dependent variables which when used in a model gives output Y. I have data collected on these links in some interval.
LinkId | var1 | var2 | var3 | var4(OUTPUT)
1 | 10 | 12.1 | 2.2 | 3
2 | 11 | 11.2 | 2.3 | 3.1
3 | 12 | 12.4 | 4.1 | 1
1 | 13 | 11.8 | 2.2 | 4
2 | 14 | 12.7 | 2.3 | 2
3 | 15 | 10.7 | 4.1 | 6
1 | 16 | 8.6 | 2.2 | 6.6
2 | 17 | 14.2 | 2.3 | 4
3 | 18 | 9.8 | 4.1 | 5
I need to perform prediction to find the output of
(2,19,10.2,2.3).
How can I do that using above data for training in Matlab using LIBSVM? Can I train the whole data as input to the svmtrain to create a model or do I need to train each link separate and use the model create for prediction? Does it make any difference? NOTE : Notice each link with same ID has same value.
This is not really a matlab
or libsvm
question but rather a generic svm
related one.
How LIBSVM works performs multivariate regression is my generalized question?
LibSVM is just a library, which in particular - implements the Support Vector Regression model for the regression tasks. In short words, in a linear case, SVR tries to find a hyperplane for which your data points are placed in some margin around it (which is quite a dual approach to the classical SVM which tries to separate data with as big margin as possible).
In non linear case the kernel trick is used (in the same fashion as in SVM), so it is still looking for a hyperplane, but in a feature space induced by the particular kernel, which results in the non linear regression in the input space.
Quite nice introduction to SVRs' can be found here: http://alex.smola.org/papers/2003/SmoSch03b.pdf
How can I do that using above data for training in Matlab using LIBSVM? Can I train the whole data as input to the svmtrain to create a model or do I need to train each link separate and use the model create for prediction? Does it make any difference? NOTE : Notice each link with same ID has same value.
You could train SVR
(as it is a regression problem) with the whole data, but:
var3
and LinkId
are the same variables (1->2.2, 2->2.3, 3->4.1
), if this is a case you should remove the LinkId
column,var1
unique ascending integers? If so, these are also probably a useless featues (as they do not seem to carry any information, they seem to be your id
numbers),[0,1]
interval, otherwise some features may become more important than others just because of their scale.Now, if you would like to create a separate model for each link, and follow above clues, you end up with 1
input variable (var2
) and 1
output variable var4
, so I would not recommend such a step. In general it seems that you have very limited featues set, it would be valuable to gather more informative features.