I am new to python and I am trying to use libsvm. I am trying to do cross validation with the help of grid.py. I get my data from a database so its not in sparse form. Is there any way to convert it in sparse form as required by the data format in grid.py. In the documents it is stated that the dataset should be in this format:
<label> <index1>:<value1> <index2>:<value2> ...
I tried using svm_train and with the help of svm_parameter I could generate the model.
y,x=[location_list,data_list]
prob=svm_problem(y,x)
param=svm_parameter('-t 2')
model=svm_train(prob,param)
Here my training data is in the format
location_list=[8143L,8163L....]
data_list=[[ -62L, -72L, -62L, -55L, -75L, -66L, -66L, -56L, -57L, -76L, -75L, -79L, -68L, -74L,
-59L....],[-62L, -72L, -62L, -55L, -75L, -66L, -66L, -56L, -57L, -76L, -75L, -79L, -68L, -74L,
-59L....],......]
I tried passing prob as the dataset in grid.py but it says dataset not found so may be it should be written in a file. Or is there any way I can pass the prob variable as the dataset to grid.py.
I can get my training data in the format shown above or below:
[8143L, -62L, -72L, -62L, -55L, -75L, -66L, -66L, -56L, -57L, -76L, -75L, -79L, -68L, -74L,
-59L,...]
[8163L, -62L, -72L, -62L, -55L, -75L, -66L, -66L, -56L, -57L, -76L, -75L, -79L, -68L, -74L,
-59L...]
...................
where the first 8143 and 8163 is the labels(classes) and the rest is the features. So my questions are:
1)How can I convert this dataset into sparse form and save it in a file to pass it to grid.py?
2) Can I save the prob variable in a file?
3) Can I pass the prob variable in the grid.py directly without saving it into a file?
I am going to answer my own question. I saved my data from the database in a csv file and used csv2libsvm.py to convert csv to libsvm data:
csv2libsvm.py <input file> <output file> [<label index = 0>] [<skip headers = 0>]
eg:
python csv2libsvm.py mydata.csv libsvm.data 0 True
Convert CSV to LIBSVM format. If there are no labels in the input file, specify label index = -1. If there are headers in the input file, specify skip headers = 1.