I've been trying to change several Weka String attributes to Nominal attributes using StringToNominal
.
Using the Filter without options produces the desired result for the class attribute. StringToNominal defaults to using the last attribute and the class is successfully converted to a Nominal
StringToNominal stringFilter = new StringToNominal();
stringFilter.setInputFormat(insts);
Instances filteredInsts = Filter.useFilter(insts, stringFilter);
However, if I try to do the same thing by passing an option, the class attribute remains a String
StringToNominal stringFilter = new StringToNominal();
String[] options = new String[2];
options[0] = "-R"; //Range option
options[1] = Integer.toString(insts.classIndex()); //The class attribute index
stringFilter.setOptions(options);
stringFilter.setInputFormat(insts);
Instances filteredInstsWOpts = Filter.useFilter(insts, stringFilter);
Here is a MCVE for the error. It produces the output "true, false". The correct output is "true, true"
import java.util.ArrayList;
import weka.core.Attribute;
import weka.core.DenseInstance;
import weka.core.Instance;
import weka.core.Instances;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.StringToNominal;
public class TestStringToNominal {
public static void main(String[] args) throws Exception {
ArrayList<Attribute> attInfo = new ArrayList<Attribute>();
attInfo.add(new Attribute("val"));
attInfo.add(new Attribute("class", (ArrayList<String>) null));
Instances insts = new Instances("test instances", attInfo, 1);
insts.setClassIndex(1);
Instance i1 = new DenseInstance(2);
i1.setValue(attInfo.get(0), 0);
i1.setValue(attInfo.get(1), "first");
insts.add(i1);
Instance i2 = new DenseInstance(2);
i2.setValue(attInfo.get(0), 1);
i2.setValue(attInfo.get(1), "second");
insts.add(i2);
StringToNominal stringFilter = new StringToNominal();
stringFilter.setInputFormat(insts);
Instances filteredInsts = Filter.useFilter(insts, stringFilter);
System.out.println(filteredInsts.classAttribute().isNominal());
StringToNominal stringFilterWOpts = new StringToNominal();
String[] options = new String[2];
options[0] = "-R";
options[1] = Integer.toString(insts.classIndex());
stringFilterWOpts.setOptions(options);
stringFilterWOpts.setInputFormat(insts);
Instances filteredInstsWOpts = Filter.useFilter(insts, stringFilterWOpts);
System.out.println(filteredInstsWOpts.classAttribute().isNominal());
}
}
I couldn't find this documented anywhere, but Instances
seems to use zero-based indexing while StringToNominal
seems to use one-based indexing.
Changing
options[1] = Integer.toString(insts.classIndex());
To
options[1] = Integer.toString(insts.classIndex() + 1);
Produces the desired output of "true, true"
The reason this is not immediately apparent, is that StringToNominal
has no effect on non-String type attributes, so in the example, it acts on the zeroth attribute, a numeric type, without any apparent effects.