I want to use SVM (Support vector machine) in my program, but I could not get the true result.
I want to know that how we must train data for SVM.
What I am doing:
Think that we have 5 document (the numbers are just an example), 3 of them is on first category and others (2 of them) are on second category, I merge the categories to each other (it means that the 3 doc that are in the first category will merge in one document), after that I made a train array like this:
double[][] train = new double[cat1.getDocument().getAttributes().size() + cat2.getDocument().getAttributes().size()][];
and I will fill the array like this:
int i = 0;
Iterator<String> iteraitor = cat1.getDocument().getAttributes().keySet().iterator();
Iterator<String> iteraitor2 = cat2.getDocument().getAttributes().keySet().iterator();
while (i < train.length) {
if (i < cat2.getDocument().getAttributes().size()) {
while (iteraitor2.hasNext()) {
String key = (String) iteraitor2.next();
Long value = cat2.getDocument().getAttributes().get(key);
double[] vals = { 0, value };
train[i] = vals;
i++;
System.out.println(vals[0] + "," + vals[1]);
}
} else {
while (iteraitor.hasNext()) {
String key = (String) iteraitor.next();
Long value = cat1.getDocument().getAttributes().get(key);
double[] vals = { 1, value };
train[i] = vals;
i++;
System.out.println(vals[0] + "," + vals[1]);
}
i++;
}
so I will continue like this to get the model :
svm_problem prob = new svm_problem();
int dataCount = train.length;
prob.y = new double[dataCount];
prob.l = dataCount;
prob.x = new svm_node[dataCount][];
for (int k = 0; k < dataCount; k++) {
double[] features = train[k];
prob.x[k] = new svm_node[features.length - 1];
for (int j = 1; j < features.length; j++) {
svm_node node = new svm_node();
node.index = j;
node.value = features[j];
prob.x[k][j - 1] = node;
}
prob.y[k] = features[0];
}
svm_parameter param = new svm_parameter();
param.probability = 1;
param.gamma = 0.5;
param.nu = 0.5;
param.C = 1;
param.svm_type = svm_parameter.C_SVC;
param.kernel_type = svm_parameter.LINEAR;
param.cache_size = 20000;
param.eps = 0.001;
svm_model model = svm.svm_train(prob, param);
Is this way correct? if not please help me to make it true.
these two answers are true : answer one , answer two,
Even without examining the code one can find conceptual errors:
think that we have 5 document , 3 of them is on first category and others( 2 of them) are on second category , i merge the categories to each other (it means that the 3 doc that are in the first category will merge in one document ) ,after that i made a train array like this
So: