This is the way I want to use Random Forest by using the RandomForest
Package:
library (randomForest)
rf1 <- randomForest(CLA ~ ., dat, ntree=100, norm.votes=FALSE)
p1 <- predict(rf1, testing, type='response')
confMat_rf1 <- table(p1,testing_CLA$CLA)
accuracy_rf1 <- sum(diag(confMat_rf1))/sum(confMat_rf1)
I don't want to use the RandomForest
Package at all. Given a dataset (dat
) and using rpart
and default values of randomforest
package, how can I get the same results? For instance, for the 100 decision trees, I need to run the following:
for(i in 1:100){
cart.models[[i]]<-rpart(CLA~ ., data = random_dataset[[i]],cp=-1)
}
Where each random_dataset[[i]]
would be randomly chosen default number of attributes and rows. In addition, does rpart
used for randomforest
?
It is possible to simulate training a random forest by training multiple trees using rpart and bootstrap samples on the training set and the features of the training set. The following code snippet trains 10 trees to classify the iris species and returns a list of trees with the out of bag accuracy on each tree.
library(rpart)
library(Metrics)
library(doParallel)
library(foreach)
library(ggplot2)
random_forest <- function(train_data, train_formula, method="class", feature_per=0.7, cp=0.01, min_split=20, min_bucket=round(min_split/3), max_depth=30, ntrees = 10) {
target_variable <- as.character(train_formula)[[2]]
features <- setdiff(colnames(train_data), target_variable)
n_features <- length(features)
ncores <- detectCores(logical=FALSE)
cl <- makeCluster(ncores)
registerDoParallel(cl)
rf_model <- foreach(
icount(ntrees),
.packages = c("rpart", "Metrics")
) %dopar% {
bagged_features <- sample(features, n_features * feature_per, replace = FALSE)
index_bag <- sample(nrow(train_data), replace=TRUE)
in_train_bag <- train_data[index_bag,]
out_train_bag <- train_data[-index_bag,]
trControl <- rpart.control(minsplit = min_split, minbucket = min_bucket, cp = cp, maxdepth = max_depth)
tree <- rpart(formula = train_formula,
data = in_train_bag,
control = trControl)
oob_pred <- predict(tree, newdata = out_train_bag, type = "class")
oob_acc <- accuracy(actual = out_train_bag[, target_variable], predicted = oob_pred)
list(tree=tree, oob_perf=oob_acc)
}
stopCluster(cl)
rf_model
}
train_formula <- as.formula("Species ~ .")
forest <- random_forest(train_data = iris, train_formula = train_formula)