Search code examples
rconfusion-matrix

How to create a confusion matrix for a decision tree model


I am having some difficulties creating a confusion matrix to compare my model prediction to the actual values. My data set has 159 explanatory variables and my target is called "classe".

#Load Data
df <- read.csv("https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv", na.strings=c("NA","#DIV/0!",""))

#Split into training and validation
index <- createDataPartition(df$classe, times=1, p=0.5)[[1]]
training <- df[index, ]
validation <- df[-index, ]

#Model
decisionTreeModel <- rpart(classe ~ ., data=training, method="class", cp =0.5)

#Predict
pred1 <- predict(decisionTreeModel, validation)

#Check model performance
confusionMatrix(validation$classe, pred1)

The following error message is generated from the code above:

Error in confusionMatrix.default(validation$classe, pred1) : 
  The data must contain some levels that overlap the reference.

I think it may have something to do with the pred1 variable that the predict function generates, it's a matrix with 5 columns while validation$classe is a factor with 5 levels. Any ideas on how to solve this?

Thanks in advance


Solution

  • Your prediction is giving you a matrix of probabilities for each class. If you want to be returned the "winner" (predicted class), replace your predict line with this:

    pred1 <- predict(decisionTreeModel, validation, type="class")