I try to apply the glm algorithm on the Iris dataset, using the following code:
library(tidyverse)
library(caret)
dataset <- iris
tt_index <- createDataPartition(dataset$Sepal.Length, times = 1, p = 0.9, list = FALSE)
train_set <- dataset[tt_index, ]
test_set <- dataset[-tt_index, ]
model_glm <- train(Species ~.,
data = train_set,
method = "glm")
But it returned me this alert:
Something is wrong; all the Accuracy metric values are missing:
Accuracy Kappa
Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA
Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA
NA's :1 NA's :1
Perhaps I am missing something, please your help will be greatly appreciated.
You're trying to train a binary (i.e. binomial) classification model to data with a response variable that has more than 2 levels. The warnings that you are getting and that you can see if you type warnings()
will tell you that
glm models can only use 2-class outcomes
So this won't work.
An option is to omit one of the outcomes, e.g. do
dataset <- subset(iris, Species != "virginica")
dataset <- transform(dataset, Species = droplevels(Species))
tt_index <- createDataPartition(
dataset$Sepal.Length, times = 1, p = 0.5, list = FALSE)
train_set <- dataset[tt_index, ]
test_set <- dataset[-tt_index, ]
model_glm <- train(
Species ~.,
data = train_set,
method = "glm",
family = "binomial")
This will still give warnings, but they have a different origin. Bottom line is, that this is probably not a very good example for testing glm
-based binomial classification.