Search code examples
rlogistic-regressionglm

Fitting a logistic model with multiple dependent vars/LHS


Is there a way to do multivariate logistic regression using glm()? I have several binary outcomes and I know you can do this with linear regression (lm()) and cbind() but I can't seem to figure out how to do it with glm() and cbind()

library(tidyverse)

lm(cbind(mpg, cyl, disp) ~ hp + drat + wt + qsec, data = mtcars) %>% 
  broom::tidy(conf.int = T)

This returns a tidy tibble with your outcomes (mpg, cyl, disp). How can I extend this to logistic regression.

df <- mtcars %>% 
  mutate(across(c(vs, am), factor))

glm(cbind(am, vs) ~ hp + drat + wt + qsec, family = binomial(link = "logit"), data = df) %>% 
  broom::tidy()

I know this isn't the best dataset to use (probably due to complete separation) but cbind just returns an unexpected output than if you ran the glms separately.

glm(am ~ hp + drat + wt + qsec, family = binomial(link = "logit"), data = df) %>% 
  broom::tidy()

glm(vs ~ hp + drat + wt + qsec, family = binomial(link = "logit"), data = df) %>% 
  broom::tidy()

Solution

  • Have a look at

    glm(cbind(am, vs) ~ hp + drat + wt + qsec, family = binomial(link = "logit"), data = df, method = "model.frame") %>% 
    broom::tidy()
    

    which returns

    
    # A tibble: 5 × 13
      column            n   mean     sd median trimmed    mad   min    max  range  skew kurtosis      se
      <chr>         <dbl>  <dbl>  <dbl>  <dbl>   <dbl>  <dbl> <dbl>  <dbl>  <dbl> <dbl>    <dbl>   <dbl>
    1 cbind(am, vs)    64   1.42  0.498   1       1.40  0      1      2      1    0.316     1.10  0.0622
    2 hp               32 147.   68.6   123     141.   52     52    335    283    0.761     3.05 12.1   
    3 drat             32   3.60  0.535   3.70    3.58  0.475  2.76   4.93   2.17 0.279     2.44  0.0945
    4 wt               32   3.22  0.978   3.32    3.15  0.517  1.51   5.42   3.91 0.444     3.17  0.173 
    5 qsec             32  17.8   1.79   17.7    17.8   0.955 14.5   22.9    8.4  0.387     3.55  0.316
    

    In the first line of results in the output above, you can see that glm seems to interpret cbind(am, vs) as a combined dependent variable of am and vs having multiple categories and not being dichotomous (in this example 0 or 1). This kind of dependent variable would require multinomial logistic regression as pointed out by Ben Bolker.

    As you mentioned, if you run seperate glm models with one dependent variable at a time, you get different results and warning messages regarding complete separation as well as fitted probabilities having values (very close to) 0 or 1.

    If you would like to run the two glm models in one go and present results in a single object, we can use purrr:map as shown here which results in the following approach:

    #--------------
    # Load packages
    #--------------
    library(tidyverse)
    
    #--------------
    # Define data, including independent variables & dependent variables
    #--------------
    df <- mtcars
    independent.variables.formula <- "~ hp + drat + wt + qsec"
    dependent.variables <- c("am", "vs")
    
    #--------------
    # create output for both models in a data.frame showing the results
    #--------------
    df_res <- map(dependent.variables, function(DV) {
      paste(DV, independent.variables.formula) %>% 
        as.formula %>% 
        glm(family=binomial(link = "logit"), data = df) %>%
        broom::tidy(conf.int = T)
    }) %>%
      bind_rows() %>%
      as.data.frame () %>%
      mutate (DV = c(rep ("am", 5), rep ("vs", 5)),
              across(c(2:4, 6:7), .fns = function(x) {format(round(x, 2), nsmall = 2)})) %>%
      relocate (DV, .before = term)
    
    df_res[ ,6] <- format.pval(df_res[ ,6], eps = .001, digits = 3) # format small p-values < 0.001 nicely
    
    df_res
    
      DV        term estimate  std.error statistic p.value   conf.low conf.high
    1  am (Intercept)   206.22 1166788.76      0.00   1.000         NA 238011.23
    2  am          hp     0.24     687.34      0.00   1.000    -139.85        NA
    3  am        drat   103.37  132242.73      0.00   0.999  -10359.33  12068.44
    4  am          wt   -94.56   84508.03      0.00   0.999  -54340.19  14115.43
    5  am        qsec   -20.03   34601.06      0.00   1.000   -3473.64   3149.50
    6  vs (Intercept) -1928.91 1845493.29      0.00   0.999 -124797.75 120939.92
    7  vs          hp     1.59    2408.34      0.00   0.999    -489.26        NA
    8  vs        drat    92.35  139959.57      0.00   0.999  -16047.24  16231.94
    9  vs          wt   -82.30   77899.01      0.00   0.999   -5580.25   4963.40
    10 vs        qsec    91.59   75674.95      0.00   0.999   -3734.28   4115.71