I've been trying to change several Weka String attributes to Nominal attributes using StringToNominal
Using the Filter without options produces the desired result for the class attribute. StringToNominal defaults to using the last attribute and the class is successfully converted to a Nominal
StringToNominal stringFilter = new StringToNominal();
Instances filteredInsts = Filter.useFilter(insts, stringFilter);
However, if I try to do the same thing by passing an option, the class attribute remains a String
StringToNominal stringFilter = new StringToNominal();
String[] options = new String[2];
options[0] = "-R"; //Range option
options[1] = Integer.toString(insts.classIndex()); //The class attribute index
Instances filteredInstsWOpts = Filter.useFilter(insts, stringFilter);
Here is a MCVE for the error. It produces the output "true, false". The correct output is "true, true"
import java.util.ArrayList;
import weka.core.Attribute;
import weka.core.DenseInstance;
import weka.core.Instance;
import weka.core.Instances;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.StringToNominal;
public class TestStringToNominal {
public static void main(String[] args) throws Exception {
ArrayList<Attribute> attInfo = new ArrayList<Attribute>();
attInfo.add(new Attribute("val"));
attInfo.add(new Attribute("class", (ArrayList<String>) null));
Instances insts = new Instances("test instances", attInfo, 1);
Instance i1 = new DenseInstance(2);
i1.setValue(attInfo.get(0), 0);
i1.setValue(attInfo.get(1), "first");
Instance i2 = new DenseInstance(2);
i2.setValue(attInfo.get(0), 1);
i2.setValue(attInfo.get(1), "second");
StringToNominal stringFilter = new StringToNominal();
Instances filteredInsts = Filter.useFilter(insts, stringFilter);
StringToNominal stringFilterWOpts = new StringToNominal();
String[] options = new String[2];
options[0] = "-R";
options[1] = Integer.toString(insts.classIndex());
Instances filteredInstsWOpts = Filter.useFilter(insts, stringFilterWOpts);
I couldn't find this documented anywhere, but Instances
seems to use zero-based indexing while StringToNominal
seems to use one-based indexing.
options[1] = Integer.toString(insts.classIndex());
options[1] = Integer.toString(insts.classIndex() + 1);
Produces the desired output of "true, true"
The reason this is not immediately apparent, is that StringToNominal
has no effect on non-String type attributes, so in the example, it acts on the zeroth attribute, a numeric type, without any apparent effects.