Search code examples
machine-learningnlpwekan-gramarff

How to represent n-gram features in arff file?


I've been searching this issue on the net but I have not been able to find a solution. The problem is that: I firstly Use Weka API in java to extract n-gram features one of which I can exemplify is

"not good"

The problem starts from here: Header of the arff file would be something like this:

@relation words
@attribute {0,1} not good

but after creating arff file, when I want to process the file, an exception is arisen which means that the structure of arff file is not correct.


Solution

  • You do not have the correct order for the attribute name and possible values. Also, attribute names that contain a space must be quoted. The example .arff file below should load.

    http://www.cs.waikato.ac.nz/ml/weka/arff.html

    @relation words
    
    @attribute 'not good' {0,1}
    
    @data
    
    0
    1
    0
    1