Search code examples
machine-learningwekafeature-selection

How to select top n features using Information Gain as criteria


I have a training.arff file, where each entry has 2000 features (attributes). I want to select the top n of those attributes using the Information Gain criteria. How can I do that using WEKA and the command line? I have checked online and it seems that it is a two stage process, because I have to use a ranker as the second step. Could someone explain me how to do it?


Solution

  • The way to do it is this:

    java weka.filters.supervised.attribute.AttributeSelection \
    -E "weka.attributeSelection.InfoGainAttributeEval" \
    -S "weka.attributeSelection.Ranker -N 10" -i training.arff -o training_IG.arff
    

    The -E option is to tell which class to use as evaluator, and the -S tells what search method to use (in this case ranking).