Search code examples
machine-learningwekafeature-selection

How many and/or what criteria does CfsSubsetEvaluator use in selecting features in each step of cross-validation while doing feature selection?


I am quite new to WEKA, and I have a dataset of 111 cases with 109 attributes. I am using feature selection tab in WEKA with CfsSubsetEval and BestFirst search method for feature selection. I am using leave-one-out cross-validation.

So, how many features does WEKA pick or what is the stopping criteria for number of features this method selects in each step of cross-validation

Thanks,

Gopi


Solution

  • The CfsSubsetEval algorithm is searching for a subset of features that work well together (have low correlation between the features and a high correlation to the target label). The score of the subset is called merit (you can see it in the output).

    The BestFirst search won't allow you to determine the number of features to select. However, you can use other methods such as the GreedyStepWise or using InformationGain/GainRatio algorithms with Rankerand define the size of the feature set.

    Another option you can use to influence the size of the set is the direction of the search (forward, backward...).

    Good luck