Search code examples
javawekalibsvmtext-classification

Using libsvm in Java for String classification


Looking around I was not able to find a good way to use libsvm with Java and I still have some open questions:

1) It is possible to use only libsvm or I have to use also weka? If any, what's the difference?

2) When using String type data how can I pass the training set as Strings? I was using matlab for a similar problem for proteins classification and there I just gave the strings to the machine without problem. Is there a way to do this in Java?

Here is an incomplete example of what I did in matlab (it works):

[~,posTrain] = fastaread('dataset/1.25.1.3_d1ilk__.pos-train.seq');
[~,posTest] = fastaread('dataset/1.25.1.3_d1ilk__.pos-test.seq');
trainKernel = spectrumKernel(trainData,k);
testKernel =  spectrumKernel(testData,k);
trainKf =[(1:length(trainData))', trainKernel];
testKf = [(1:length(testData))', testKernel];
disp('custom');
model = libsvmtrain(trainLabel,trainKf,'-t 4');
[~, accuracy, ~] = libsvmpredict(testLabel,testKf,model)

As you can see I read the file in fasta format and feed them to libsvm but libsvm for java look like it wants something called Node that is made of double. What I did is to take byte[] from the String and then transform them into Double. Is it correct?

3) How to use a custom kernel? I've found this line of code

 KernelManager.setCustomKernel(custom_kernel);      

but with my libsvm.jar I don't find. Which lib do I have to use?

Sorry for the multiple questions, I hope you will give me a brief overview of what is going on here. Thanks.


Solution

  • Please note that I've used LIBSVM for MATLAB, but not for Java. I can only really answer question 1, but hopefully this still helps:

    1. It definitely is possible to use libsvm only, and the code is located here: https://www.csie.ntu.edu.tw/~cjlin/libsvm/. Note that jlibsvm is a port of libsvm, and it seems to be easier to use and more optimized for Java. As far as I can tell, weka just has a wrapper class that runs libsvm anyways (it even requires the libsvm.jar), though I mainly based it off of this: https://weka.wikispaces.com/LibSVM.