Search code examples
rplotrandom-forestcross-validation

How to visualise my RF model's performance in R, based on the cross-validation?


I have a model the the following example and I want to show its stability and consistency/performance through the k-folds. What is the best visualization/interpretation for this purpose?

data(iris)
mydf=iris[,1:4]
control = trainControl(method="repeatedcv", number=5,repeats=5,savePredictions = TRUE)
for_train = createDataPartition(mydf$Sepal.Length, p=.66, list=FALSE) 
train=mydf[for_train,] 
test=mydf[-for_train,] 

mytrf_iris = train(Sepal.Length~ .,
                 data=train,ntree=800,method="rf",metric="Rsquared",trControl=control,importance = TRUE)

Solution

  • library(caret)
    #> Loading required package: lattice
    #> Loading required package: ggplot2
    library(tidyverse)
    
    data(iris)
    mydf <- iris[, 1:4]
    control <- trainControl(method = "repeatedcv", number = 5, repeats = 5, savePredictions = TRUE)
    for_train <- createDataPartition(mydf$Sepal.Length, p = .66, list = FALSE)
    train <- mydf[for_train, ]
    test <- mydf[-for_train, ]
    
    mytrf_iris <- train(Sepal.Length ~ .,
      data = train, ntree = 800, method = "rf", metric = "Rsquared", trControl = control, importance = TRUE
    )
    #> note: only 2 unique complexity parameters in default grid. Truncating the grid to 2 .
    
    plot(mytrf_iris)
    

    mytrf_iris$results
    #>   mtry      RMSE  Rsquared       MAE     RMSESD RsquaredSD      MAESD
    #> 1    2 0.3173592 0.8673463 0.2684422 0.02888781 0.03709340 0.02502770
    #> 2    3 0.3067187 0.8730667 0.2582250 0.02850211 0.03610797 0.02556025
    
    mytrf_iris$resample %>%
      pivot_longer(-Resample) %>%
      qplot(name, value, geom = "boxplot", data = .)
    

    Created on 2021-12-15 by the reprex package (v2.0.1)