I was trying out the sparse representation of the arff file as shown here. In my program I am able to print the the class label "B" but for some reason it is not printing "A".
attVals = new FastVector();
attVals.addElement("A");
attVals.addElement("B");
atts.addElement(new Attribute("class", attVals));
vals[index] = attVals.indexOf("A");
The output for the program is like -
{0 6,2 8} --- I should get {0 6,2 8,3 A}
But when I do
vals[index] = attVals.indexOf("B");
I get proper output -
{0 6,2 8,3 B}
For some reason it is not taking the index 0. Can someone tell me why this is happening?
This is a very popular problem. The Sparse format by definition does not store 0 values.
Weka ARFF format page clearly says that:
Warning: There is a known problem saving SparseInstance objects from datasets that have string attributes. In Weka, string and nominal data values are stored as numbers; these numbers act as indexes into an array of possible attribute values (this is very efficient). However, the first string value is assigned index 0: this means that, internally, this value is stored as a 0. When a SparseInstance is written, string instances with internal value 0 are not output, so their string value is lost (and when the arff file is read again, the default value 0 is the index of a different string value, so the attribute value appears to change). To get around this problem, add a dummy string value at index 0 that is never used whenever you declare string attributes that are likely to be used in SparseInstance objects and saved as Sparse ARFF files.
You have to put a dummy attribute in the first place. Just modify your code to:
attVals = new FastVector();
attVals.addElement("dummy");
attVals.addElement("A");
attVals.addElement("B");
Let me know if you need any further help.