Search code examples
c++opencvmachine-learningfeature-selectionadaboost

How do I use AdaBoost for feature selection?


I want to use AdaBoost to choose a good set features from a large number (~100k). AdaBoost works by iterating though the feature set and adding in features based on how well they preform. It chooses features that preform well on samples that were mis-classified by the existing feature set.

Im currently using in Open CV's CvBoost. I got an example working, but from the documentation it is not clear how to pull out the feature indexes that It has used.

Using either CvBoost, a 3rd party library or implementing it myself, how can pull out a set of features from a large feature set using AdaBoot?


Solution

  • Claim: I am not a user of opencv. From the documentation, opencv's adaboost is using the decision tree (either classification tree or regression tree) as the fundamental weak learner.

    It seems to me this is the way to get the underline weak learners:

    CvBoost::get_weak_predictors
    Returns the sequence of weak tree classifiers.
    
    C++: CvSeq* CvBoost::get_weak_predictors()
    The method returns the sequence of weak classifiers. 
    Each element of the sequence is a pointer to the CvBoostTree class or 
    to some of its derivatives.
    

    Once you have access to the sequence of CvBoostTree*, you should be able to inspect which features are contained in the tree and what are the split value etc.

    If each tree is only a decision stump, only one feature is contained in each weak learner. But if we allow deeper depth of tree, a combination of features could exist in each individual weak learner.

    I further took a look at the CvBoostTree class; unfortunately the class itself does not provide a public method to check the internal features used. But you might want to create your own sub-class inheriting from CvBoostTree and expose whatever functionality.