Search code examples
weka

Only one ranked attribute, but selected two? InfoGain Ranker in weka


I've run an InfoGain evaluation on my dataset, with a Ranker on threshold 0.1.

My output via the GUI says:

Search Method:
    Attribute ranking.
    Threshold for discarding attributes:   0.1   

Attribute Evaluator (supervised, Class (nominal): 23 class):
    Information Gain Ranking Filter

Ranked attributes:
 0.141    2 nr_visits

Selected attributes: 2 : 1

In my java implementation, I do the same thing:

Ranker ranker = new Ranker();
ranker.setGenerateRanking(true);
ranker.setThreshold(0.1);

AttributeSelection attsel = new AttributeSelection();
InfoGainAttributeEval eval = new InfoGainAttributeEval();

attsel.setEvaluator(eval);
attsel.setSearch(ranker);

attsel.SelectAttributes(instances);

int[] ranked_attr = attsel.selectedAttributes();
double[][] rawscores = attsel.rankedAttributes();

Where I get similar output:

  • my ranked_attr is [1, 21] (with 1 being the nr_visits feature, and 21 another)
  • my rawscores double array does NOT contain ANY entry for 21. It has the 1, and then another feature with a score lower than my threshold.

What gives? Are there one or two selected features? Is this a bug in weka 3.8.4?


Solution

  • Thanks to Eibe on the mailing list:

    AFAIK, the set of indices returned by selectedAttributes() includes the index of the class attribute. I assume that attribute 22 in your data is the class attribute. There is no score for the class attribute because it is the attribute that we are trying to predict.

    Because yes, the 21 was indeed my class index, which is zero-based in code, one-based in the GUI, which is why I didn't immediately notice.