Search code examples
javawekatext-classificationarff

Generate an Arff File for Weka


Hye there I am new to this work and I am getting confused after searching about how to get through it! Actually i want to create a sparse ARFF file for weka for text classification! I have been searching online how to get start with it. My requirement is to generate a sparse arff file that should be compatible with the weka! The outline for the arff should be like:

 @relation myrelation
 @attribute att0 numeric
 @attribute att1 numeric
 @data
 {0,1,4,5 , A}
 {0,5,2,,1 B}

Such that I have some strings and then a class suppose my data set is as follow:

 string is a string A
 Hello a string B
 Another is string C
 .
 .
 .

first comes the string and then the class as A,B or C... So what i want is to convert my dataset into above mentioned sparse arff format. Can somebody give me a direction how can i do it? please I want to do it in java


Solution

  • You can use Weka's StringToWordVector filter to convert the text into a word vector (but not necessarily a sparse matrix). Take a look at my tutorial on this.