Search code examples
rmachine-learninglogistic-regression

How would I get the pattern of errors on test items for a logistic regression model?


I am trying to analyse the pattern of error (accuracy) on test items for the model I coded below. I would like to find out how often Setosa and Versicolor Species of iris are incorrectly classified as Virginica and how often Virginica Species of iris are incorrectly classified as not Virginica. Could this be done? Any suggestions would be great. Here are my logistic regression model and a built classifer using the model...

library(datasets)
iris$dummy_virginica_iris <- 0
iris$dummy_virginica_iris[iris$Species == 'virginica'] <- 1
iris$dummy_virginica_iris

# Logistic regression model.
glm <- glm(dummy_virginica_iris ~ Petal.Width + Sepal.Width, 
        data = iris, 
        family = 'binomial') 
summary(glm)

# Classifer.
glm.pred <- predict(glm, type="response")
virginica <- ifelse(glm.pred > .5, TRUE, FALSE)

Solution

  • You can create a new vector to seperate the flowers into virginica / non-virginica like this:

    species <- as.character(iris$Species)
    species[species != "virginica"] <- "non-virginica"
    

    Then you can just tabulate this against your model's predictions as a 2 x 2 contingency table:

    result <- table(virginica, species)
    print(result)
    #          species
    # virginica non-virginica virginica
    #     FALSE            96         3
    #     TRUE              4        47
    

    Which allows for easy calculations of sensitivity, specificity and accuracy of your model like this:

    sensitivity <- result[2, 2] / sum(result[, 2])
    specificity <- result[1, 1] / sum(result[, 1])
    accuracy    <- (result[1, 1] + result[2, 2]) / sum(result)
    sensitivity
    # [1] 0.94
    specificity
    # [1] 0.96
    accuracy
    # [1] 0.9533333