I trained a random forest:
model <- randomForest(x, y, proximity=TRUE)
When I want to predict y for new objects, I use
y_pred <- predict(model, xnew)
How can I calculate the proximity between the new objects (xnew) and the training set (x) based on the already existing forest (model)? The proximity option in the predict function gives only the proxmities among the new objects (xnew). I could run randomForest unsupervised again on a combined data set (x and xnew) to get the proximities, but I think there must be some way to avoid building the forest again and instead using the already existing one.
Thanks! Kilian
I believe what you want is to specify your test observations in the randomForest
call itself, something like this:
set.seed(71)
ind <- sample(1:150,140,replace = FALSE)
train <- iris[ind,]
test <- iris[-ind,]
iris.rf1 <- randomForest(x = train[,1:4],
y = train[,5],
xtest = test[,1:4],
ytest = test[,5],
importance=TRUE,
proximity=TRUE)
dim(iris.rf1$test$prox)
[1] 10 150
So that gives you the proximity from the ten test cases to all 150.
The only other option would be to call predict
on your new case rbind
ed to the original training cases, I think. But that way you don't need to have your test cases up front with the randomForest
call.
In that case, you'll want to use keep.forest = TRUE
in the randomForest
call and of course set proximity = TRUE
when you call predict
.