Search code examples
javaweka

Options added to StringToNominal have no effect


Problem statement

I've been trying to change several Weka String attributes to Nominal attributes using StringToNominal.

Using the Filter without options produces the desired result for the class attribute. StringToNominal defaults to using the last attribute and the class is successfully converted to a Nominal

StringToNominal stringFilter = new StringToNominal();
stringFilter.setInputFormat(insts);
Instances filteredInsts = Filter.useFilter(insts, stringFilter); 

However, if I try to do the same thing by passing an option, the class attribute remains a String

StringToNominal stringFilter = new StringToNominal();
String[] options = new String[2];
options[0] = "-R"; //Range option
options[1] = Integer.toString(insts.classIndex()); //The class attribute index
stringFilter.setOptions(options);
stringFilter.setInputFormat(insts);
Instances filteredInstsWOpts = Filter.useFilter(insts, stringFilter);

MCVE

Here is a MCVE for the error. It produces the output "true, false". The correct output is "true, true"

import java.util.ArrayList;

import weka.core.Attribute;
import weka.core.DenseInstance;
import weka.core.Instance;
import weka.core.Instances;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.StringToNominal;

public class TestStringToNominal {

    public static void main(String[] args) throws Exception {
        ArrayList<Attribute> attInfo = new ArrayList<Attribute>();
        attInfo.add(new Attribute("val"));
        attInfo.add(new Attribute("class", (ArrayList<String>) null));

        Instances insts = new Instances("test instances", attInfo, 1);
        insts.setClassIndex(1);

        Instance i1 = new DenseInstance(2);
        i1.setValue(attInfo.get(0), 0);
        i1.setValue(attInfo.get(1), "first");
        insts.add(i1);

        Instance i2 = new DenseInstance(2);
        i2.setValue(attInfo.get(0), 1);
        i2.setValue(attInfo.get(1), "second");
        insts.add(i2);

        StringToNominal stringFilter = new StringToNominal();
        stringFilter.setInputFormat(insts);
        Instances filteredInsts = Filter.useFilter(insts, stringFilter);
        System.out.println(filteredInsts.classAttribute().isNominal());

        StringToNominal stringFilterWOpts = new StringToNominal();
        String[] options = new String[2];
        options[0] = "-R";
        options[1] = Integer.toString(insts.classIndex());
        stringFilterWOpts.setOptions(options);
        stringFilterWOpts.setInputFormat(insts);
        Instances filteredInstsWOpts = Filter.useFilter(insts, stringFilterWOpts);
        System.out.println(filteredInstsWOpts.classAttribute().isNominal());
    }

}

Solution

  • I couldn't find this documented anywhere, but Instances seems to use zero-based indexing while StringToNominal seems to use one-based indexing.

    Changing

    options[1] = Integer.toString(insts.classIndex());
    

    To

    options[1] = Integer.toString(insts.classIndex() + 1);
    

    Produces the desired output of "true, true"

    The reason this is not immediately apparent, is that StringToNominal has no effect on non-String type attributes, so in the example, it acts on the zeroth attribute, a numeric type, without any apparent effects.