Docs: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.dump_svmlight_file.html
svmlight follows the data format:
<target> <feature:value> <feature:value>
With the data:
a = [[1,2,3],[4,5,6]]
b = [8,9]
Running the command:
dump_svmlight_file(a,b,'test.txt')
Outputs the following:
8 0:1 1:2 2:3
9 0:4 1:5 2:6
I would like to know if there is a way to specify the feature name rather than have it increment from 0, I would like to have something like the following as my result:
1 10:5 50:15 100:50
0 10:15 25:5 75:15
1 20:5 40:5 60:5
Does the dump_svmlight_file command have such a capability?
No. dump_svmlight_file
does not have that option built in. Source code
You can just specify whether the feature names should start at 0 or 1 using the parameter zero_based
.
Documentation
I would suggest you not to try dump the file with actual feature names, which would unnecessarily increase size of the file. Instead pickle your feature names as a separate one and then join them.