Search code examples
rtidyrpurrrglmbroom

How to efficiently run many logistic regressions in R and skip over equations that throw errors?


As a continuation from this question, I want to run many logistic regression equations at once and then note if a group was significantly different from a reference group. This solution works, but it only works when I'm not missing values. Being that my data has 100 equations, it's bound to have missing values, so rather than this solution failing when it hits an error, how can I program it to skip the instances that throw an error?

Here's a modified dataset that's missing cases:

library(dplyr)
library(lubridate)
library(broom)

test <- tibble(major = as.factor(c(rep(c("undeclared", "computer science", "english"), 2), "undeclared")),
               time = ymd(c(rep("'2021-01-01", 3), rep("'2020-01-01", 3), rep("'2019-01-01", 1))),
               admit = c(500, 1000, 450, 800, 300, 100, 1000),
               reject = c(1000, 300, 1000, 210, 100, 900, 1500)) %>%
  mutate(total = rowSums(test[ , c("admit", "reject")], na.rm=TRUE)) %>%
  mutate(accept_rate = admit/total)

And here's the solution that works when it has all cases (see dataset here), but when it hits the 2019 grouping that's missing cases, it fails:

library(dplyr)
library(lubridate)
library(broom)
library(tidyr)
library(purrr)

test %>% 
  # create year column
  mutate(year = year(time), 
         major = relevel(major, "undeclared")) %>% 
  
  # nest by year
  nest(data = -year) %>% 
  
  # compute regression
  mutate(reg = map(data, ~glm(accept_rate ~ major, data = ., 
                              family = binomial, weights = total, na.action = na.exclude)), 
         
         # use broom::tidy to make a tibble out of model object
         reg_tidy = map(reg, tidy)) %>% 
  
  # get data and regression results back to tibble form
  unnest(c(data, reg_tidy)) %>% 
  filter(term != "(Intercept)") %>%
  
  # create the significant yes/no column
  mutate(significant = ifelse(p.value < 0.05, "Yes", "No")) %>% 
  
  # remove the unnecessary columns
  select(-c(term, estimate, std.error, statistic, p.value, reg))

I also tried wrapping the solution using the custom functions here, but I also couldn't get it to work. Last, I'm also open to other ideas for a solution if it produces a similar output and is resistant to these errors.


Solution

  • To ignore errors, use this function:

    get_model <- function(df) {
      tryCatch(
        glm(accept_rate ~ major, data = df, family = binomial, weights = total, na.action = na.exclude),
        error = function(e) NULL, warning=function(w) NULL)
    }
    

    Use it where you call mutate(reg=map()...):

      # compute regression
      mutate(reg = map(data, get_model), reg_tidy = map(reg, tidy))
    

    Output:

    # A tibble: 4 x 8
       year major            time       admit reject total accept_rate significant
      <dbl> <fct>            <date>     <dbl>  <dbl> <dbl>       <dbl> <chr>      
    1  2021 computer science 2021-01-01  1000    300  1300       0.769 Yes        
    2  2021 english          2021-01-01   450   1000  1450       0.310 No         
    3  2020 computer science 2020-01-01   300    100   400       0.75  No         
    4  2020 english          2020-01-01   100    900  1000       0.1   Yes