Search code examples
rrandom-forestdecision-treer-caret

How to get finalModel selected features in randomForest?


I'm using caret to train a parRF model with a tunegrid that contains a sequence of mtry = 3:20.

When the algorithm finishes, it prints the following:

 mtry  ROC        Sens       Spec     
   2    0.7420331  0.6204671  0.7424294
   3    0.7476080  0.6390894  0.7343044
   4    0.7472579  0.6364214  0.7376243
   5    0.7476245  0.6351781  0.7349261
   6    0.7476901  0.6340793  0.7424026
   7    0.7485309  0.6323017  0.7431485
   8    0.7477496  0.6330511  0.7459274
   9    0.7481676  0.6301848  0.7462164
  10    0.7472944  0.6298118  0.7496909
  11    0.7474194  0.6325235  0.7514651
  12    0.7470044  0.6303864  0.7512466
  13    0.7471885  0.6261626  0.7511862
  14    0.7460856  0.6264819  0.7522480
  15    0.7467873  0.6261324  0.7561996
  16    0.7479428  0.6255679  0.7550840
  17    0.7464456  0.6260585  0.7537030
  18    0.7466500  0.6236055  0.7542641
  19    0.7473104  0.6262634  0.7562870
  20    0.7473408  0.6232997  0.7595128

The best ROC used mtry = 7. I want to extract those seven features, is it possible?


Solution

  • I was under the impression that the mtry determines how many features are sampled at each node/split in the tree's. This does not mean that 7 features were used.

    You want to look at the feature importance of the model when using mtry = 7.

    May be of use to read: https://topepo.github.io/caret/variable-importance.html

    There are some examples within the documentation that explain how to extract feature importance and what the metrics means.

    Make sure your model is set up correctly before making any strong assumptions about those features you extract.