Search code examples
rlogistic-regression

Result of glm() for logistic regression


This might be a trivial question but I don't know where to find answers. I'm wondering when using glm() for logistic regression in R, if the response variable Y has factor values 1 or 2, does the result of glm() correspond to logit(P(Y=1)) or logit(P(Y=2))? What if Y has logical values TRUE or FALSE?


Solution

  • Why not just test it yourself?

    output_bool <- c(rep(c(TRUE, FALSE), c(25, 75)), rep(c(TRUE, FALSE), c(75, 25)))
    output_num <- c(rep(c(2, 1), c(25, 75)), rep(c(2, 1), c(75, 25)))
    output_fact <- factor(output_num)
    var <- rep(c("unlikely", "likely"), each = 100)
    
    glm(output_bool ~ var, binomial)
    #> 
    #> Call:  glm(formula = output_bool ~ var, family = binomial)
    #> 
    #> Coefficients:
    #> (Intercept)  varunlikely  
    #>       1.099       -2.197  
    #> 
    #> Degrees of Freedom: 199 Total (i.e. Null);  198 Residual
    #> Null Deviance:       277.3 
    #> Residual Deviance: 224.9     AIC: 228.9
    glm(output_num ~ var, binomial)
    #> Error in eval(family$initialize): y values must be 0 <= y <= 1
    glm(output_fact ~ var, binomial)
    #> 
    #> Call:  glm(formula = output_fact ~ var, family = binomial)
    #> 
    #> Coefficients:
    #> (Intercept)  varunlikely  
    #>       1.099       -2.197  
    #> 
    #> Degrees of Freedom: 199 Total (i.e. Null);  198 Residual
    #> Null Deviance:       277.3 
    #> Residual Deviance: 224.9     AIC: 228.9
    

    So, we get the correct answer if we use TRUE and FALSE, an error if we use 1 and 2 as numbers, and the correct result if we use 1 and 2 as a factor with two levels provided the TRUE value has a higher factor level than the FALSE. However, we have to be careful in how our factors are ordered or we will get the wrong result:

    output_fact <- factor(output_fact, levels = c("2", "1"))
    glm(output_fact ~ var, binomial)
    #> 
    #> Call:  glm(formula = output_fact ~ var, family = binomial)
    #> 
    #> Coefficients:
    #> (Intercept)  varunlikely  
    #>      -1.099        2.197  
    #> 
    #> Degrees of Freedom: 199 Total (i.e. Null);  198 Residual
    #> Null Deviance:       277.3 
    #> Residual Deviance: 224.9     AIC: 228.9
    

    (Notice the intercept and coefficient have flipped signs)

    Created on 2020-06-21 by the reprex package (v0.3.0)