Search code examples
rr-caretconfusion-matrix

Error: `data` and `reference` should be factors with the same levels. Using confusionMatrix (caret)


I am getting an error when using the confusionMatrix() function from the caret package. To reproduce the example, I use the Sonar dataset from the mlbench package.

library(mlbench)
data(Sonar)

rows <- sample(nrow(Sonar))
Sonar <- Sonar[rows, ]


split <- round(nrow(Sonar) * 0.6)
adiestramiento <- Sonar[1:split, ]
experimental <- Sonar[(split + 1):nrow(Sonar), ]

model <- glm(Class ~ ., family = binomial(link = "logit"), adiestramiento)
p <- predict(model, experimental, type = "response")
p_class <- ifelse(p > 0.5, "M", "R")

library(caret)
confusionMatrix(p_class, experimental[["Class"]])

The error I am getting when running confusionMatrix() is

Error: data and reference should be factors with the same levels`

I checked that both p_class and experimental[["Class"]] have the same number of objetcs (83).

Any idea what's going on?


Solution

  • The issue is that data or, in this case, p_class has to be a factor. So, instead we should use

    confusionMatrix(factor(p_class), experimental[["Class"]])
    # Confusion Matrix and Statistics
    # 
    #           Reference
    # Prediction  M  R
    #          M 17 20
    #          R 33 13
    # ...