I would like to use cross validation to determine the number of variables to try in a Random Forest method. I don't understand how to use the mtry
argument in the rfcv()
function.
I have 6 predictors in my dataset. I want to use mtry = 6,5,4,3,2,1
, e.g., any possible m value, and cross validate with 5-fold CV.
I believe this can be done with rfcv()
function of randomForest
package. I am running the code:
rf_cv<- rfcv(training_x,training_y,cv.fold=5, mtry=function(p) max(1, p-1))
However, calling rf_cv$n.var
gives me:
[1] 6 3 1
So, this method does not apply mtry
as I was hoping, since I said each time subtract the number of variables used by 1.
How can I try every number of variables by applying a 5-fold cross validation for each number of variable?
I checked this post, however it is not completely related since they are discussing the default of mtry
.
In the post you referenced, it explains how the steps will determine the mtry tested. So in your case, p=6, and since you did not change step or the scale, then:
p=6; 0.5
k <- floor(log(p, base = 1/step))
n.var <- round(p * step^(0:(k - 1)))
[1] 6 3
And if n.var does not include 1, it goes ahead and includes it for you, which gives you 6,3,1. So if you want to try all numbers, set mtry to be identity, and step to be 1, set scale to anything but "log" (yeah the code doesn't give you other options):
rf_cv=rfcv(matrix(rnorm(100*6),ncol=6),rnorm(100),cv.fold=3,
mtry=identity,scale="new",step=-1)
rf_cv$n.var
[1] 6 5 4 3 2 1