Search code examples
rdistributionlogistic-regressionstatistical-test

Perform statistical analysis using binomial distribution


I am trying to use the binomial distribution to test whether a "random" model that just randomly responds "virginica" 50% of the time, "setosa" 25% of the time and "versicolor" the last 25% of the time to see whether my logistic regression model is more accurate or vice versa. Can this be done? Here is my attempt...

library(datasets)
iris$dummy_virginica_iris <- 0
iris$dummy_virginica_iris[iris$Species == 'virginica'] <- 1
iris$dummy_virginica_iris

# Logistic regression model.
glm <- glm(dummy_virginica_iris ~ Petal.Width + Sepal.Width, 
        data = iris, 
        family = 'binomial') 
summary(glm)

# Classifer.
glm.pred <- predict(glm, type="response")
virginica <- ifelse(glm.pred > .5, TRUE, FALSE)
table(iris$Species, virginica)

# Table of predictions.
table(virginica, iris$dummy_virginica_iris)

# Binomial distribution??
rbinom(160, 1, 0.5)

Solution

  • You can use sample to do this:

    set.seed(1)
    
    rando <- sample(c('virginica', 'setosa', 'versicolor'),  # vector of possible responses
                    prob = c(1/2, 1/4, 1/4),  # probabilities of those responses
                    size = length(virginica),  # number of responses desired
                    replace = TRUE)  # specify sampling with replacement
    
    table(rando, iris$dummy_virginica_iris)
    
    rando         0  1
      setosa     27  8
      versicolor 21 18
      virginica  52 24
    
    rando_virginica <- ifelse(rando == 'virginica', TRUE, FALSE)
    table(rando_virginica, iris$dummy_virginica_iris)
    
    rando_virginica  0  1
              FALSE 48 26
              TRUE  52 24