Hye there I am new to this work and I am getting confused after searching about how to get through it! Actually i want to create a sparse ARFF file for weka for text classification! I have been searching online how to get start with it. My requirement is to generate a sparse arff file that should be compatible with the weka! The outline for the arff should be like:
@relation myrelation
@attribute att0 numeric
@attribute att1 numeric
@data
{0,1,4,5 , A}
{0,5,2,,1 B}
Such that I have some strings and then a class suppose my data set is as follow:
string is a string A
Hello a string B
Another is string C
.
.
.
first comes the string and then the class as A,B or C... So what i want is to convert my dataset into above mentioned sparse arff format. Can somebody give me a direction how can i do it? please I want to do it in java
You can use Weka's StringToWordVector filter to convert the text into a word vector (but not necessarily a sparse matrix). Take a look at my tutorial on this.