I'm using caret to train a parRF model with a tunegrid that contains a sequence of mtry = 3:20.
When the algorithm finishes, it prints the following:
mtry ROC Sens Spec
2 0.7420331 0.6204671 0.7424294
3 0.7476080 0.6390894 0.7343044
4 0.7472579 0.6364214 0.7376243
5 0.7476245 0.6351781 0.7349261
6 0.7476901 0.6340793 0.7424026
7 0.7485309 0.6323017 0.7431485
8 0.7477496 0.6330511 0.7459274
9 0.7481676 0.6301848 0.7462164
10 0.7472944 0.6298118 0.7496909
11 0.7474194 0.6325235 0.7514651
12 0.7470044 0.6303864 0.7512466
13 0.7471885 0.6261626 0.7511862
14 0.7460856 0.6264819 0.7522480
15 0.7467873 0.6261324 0.7561996
16 0.7479428 0.6255679 0.7550840
17 0.7464456 0.6260585 0.7537030
18 0.7466500 0.6236055 0.7542641
19 0.7473104 0.6262634 0.7562870
20 0.7473408 0.6232997 0.7595128
The best ROC used mtry = 7. I want to extract those seven features, is it possible?
I was under the impression that the mtry
determines how many features are sampled at each node/split in the tree's. This does not mean that 7 features were used.
You want to look at the feature importance of the model when using mtry = 7
.
May be of use to read: https://topepo.github.io/caret/variable-importance.html
There are some examples within the documentation that explain how to extract feature importance and what the metrics means.
Make sure your model is set up correctly before making any strong assumptions about those features you extract.