Search code examples
rr-caret

Apply glm to iris dataset in caret R


I try to apply the glm algorithm on the Iris dataset, using the following code:

library(tidyverse)
library(caret)

dataset <- iris
tt_index <- createDataPartition(dataset$Sepal.Length, times = 1, p = 0.9, list = FALSE)
train_set <- dataset[tt_index, ]
test_set <- dataset[-tt_index, ]

model_glm <- train(Species ~., 
                   data = train_set,
                   method = "glm")

But it returned me this alert:

Something is wrong; all the Accuracy metric values are missing:
    Accuracy       Kappa    
 Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA  
 NA's   :1     NA's   :1    

Perhaps I am missing something, please your help will be greatly appreciated.


Solution

  • You're trying to train a binary (i.e. binomial) classification model to data with a response variable that has more than 2 levels. The warnings that you are getting and that you can see if you type warnings() will tell you that

    glm models can only use 2-class outcomes

    So this won't work.

    An option is to omit one of the outcomes, e.g. do

    dataset <- subset(iris, Species != "virginica")
    dataset <- transform(dataset, Species = droplevels(Species))
    
    tt_index <- createDataPartition(
        dataset$Sepal.Length, times = 1, p = 0.5, list = FALSE)
    train_set <- dataset[tt_index, ]
    test_set <- dataset[-tt_index, ]
    
    model_glm <- train(
        Species ~.,
        data = train_set,
        method = "glm",
        family = "binomial")
    

    This will still give warnings, but they have a different origin. Bottom line is, that this is probably not a very good example for testing glm-based binomial classification.