Search code examples
rggplot2plotdata-visualizationmlr

R: plotting results with the ml3 library


I am using the R programming language. I am trying to replicate the plots from the following stackoverflow post using the "mlr" library: R: multiplot for plotLearnerPrediction ggplot objects of MLR firing errors in RStudio

(I am also using this site here: https://www.analyticsvidhya.com/blog/2016/08/practicing-machine-learning-techniques-in-r-with-mlr-package/)

First, I created the data for this exercise ("response variable" is the response, all other variables are the predictors)

 #load libraries
    library(mlr)
    library(girdExtra)
    library(ggplot2)
    library(rpart)
    
    #create data
    
    a = rnorm(1000, 10, 10)
    b = rnorm(1000, 10, 5)
    c = rnorm(1000, 5, 10)
    d <- sample( LETTERS[1:3], 1000, replace=TRUE, prob=c(0.2, 0.6, 0.2) )
    response_variable <- sample( LETTERS[1:2], 1000, replace=TRUE, prob=c(0.3, 0.7) )
    
    data <- data.frame(a, b, c, d, response_variable)
    data$d = as.factor(data$d)
    data$response_variable = as.factor(data$response_variable)

From here, I tried to follow the "mlr" part of the tutorial (only with the "decision tree" and the "random forest" algorithm):

task <- makeClassifTask(data = data, target = "response_variable")

learners = list( 
    "classif.randomForest", 
    "classif.rpart" )

p1<-plotLearnerPrediction(learner = learners[[1]], task = task)
p2<-plotLearnerPrediction(learner = learners[[2]], task = task)

Can someone please tell me if the plots I have produced as the user is intended to do so?

Thanks


Solution

  • Yes, they are as the user is intended to do so. To see this, you can run the same commands on the toy data. From this, you will see that the classification is correct. The only thing is that in your data the response has absolutely nothing to do with the predictors, so the classification sucks (in fact, it seems to be predicting everything as "B").

    a = rnorm(100, 10, 10)
    b = rnorm(100, 10, 5)
    
    data <- data.frame(a, b)
    library(dplyr)
    data=mutate(data, response_variable=ifelse(a>mean(a) | b<mean(b), "A", "B"))