Search code examples
rnestedmodels

Unnest fitted glm models


I have a tibble with nested glm models. I nest over a variable (region) and run a function region_model that fits the model.

# toy data
test_data = data.frame(region = sample(letters[1:3], 1000, replace = TRUE),
              x = sample(0:1, 1000, replace = TRUE), 
                                   y = sample(1:100, 1000, replace = TRUE), 
                                   z = sample(0:1, 1000, replace = TRUE)) %>% arrange(region)

# nest
by_region = test_data %>%
              group_by(region) %>%
              nest()


# glm function 
region_model  <- function(df) {
 glm(x ~ y + z, data = df, family = "binomial")
}              

# run the model  
    by_region = by_region %>% mutate(mod_rat = data %>% map(region_model))

The resulting tibble looks like this:

> by_region
# A tibble: 3 x 3
  region               data   mod_rat
  <fctr>             <list>    <list>
1      a <tibble [352 x 3]> <S3: glm>
2      b <tibble [329 x 3]> <S3: glm>
3      c <tibble [319 x 3]> <S3: glm>  

My purpose is to unnest the models to calculate marginal effects. I have tried it and I have got this error:

> unnest(by_region, mod_rat)
Error: Each column must either be a list of vectors or a list of data frames [mod_rat]

I wonder whether it possible to use unnest on this type of objects (<S3: glm>) and in case not, whether there is an alternative to get these estimates.


Solution

  • As it happens, the margins package has had some recent updates which will help you do this in a tidy fashion. In particular a margins_summary() function has been added that can be mapped onto nested model objects.

    This issue on GitHub has the details.

    Here is some code that works with your example

    Using data from above

    library(tidyverse)
    library(magrittr)
    library(margins)
    
    # toy data
    test_data <- data.frame(region = sample(letters[1:3], 1000, replace = TRUE),
                                 x = sample(0:1, 1000, replace = TRUE), 
                                 y = sample(1:100, 1000, replace = TRUE), 
                                 z = sample(0:1, 1000, replace = TRUE)) %>% 
    arrange(region)
    
    # nest
    by_region <- 
        test_data %>%
        group_by(region) %>%
        nest()
    
    # glm function 
    region_model  <- function(df) {
       glm(x ~ y + z, data = df, family = "binomial")
    }              
    
    # run the model  
    by_region %<>% 
      mutate(mod_rat = map(data, region_model))
    

    Using the margins_summary() function via purrr:map2() to compute marginal effects (I have included both methods for calculating the marginal effects with logistic regression as described in the package vignette)

    by_region %<>% 
        mutate(marginals      = map2(mod_rat, data, ~margins_summary(.x, data = .y)),
               marginals_link = map2(mod_rat, data, ~margins_summary(.x, data = .y, type = "link")))
    

    We can now unnest either of the created list columns with the marginal effect data

    by_region %>% 
       unnest(marginals) -> region_marginals
       region_marginals
    # A tibble: 6 x 8
      region factor      AME      SE      z     p
        <fct>  <chr>     <dbl>   <dbl>  <dbl> <dbl>
      1 a      y      -9.38e-4 9.71e-4 -0.966 0.334
      2 a      z       3.59e-2 5.55e-2  0.647 0.517
      3 b      y       1.14e-3 9.19e-4  1.24  0.215
      4 b      z      -2.93e-2 5.38e-2 -0.545 0.586 
      5 c      y       4.67e-4 9.77e-4  0.478 0.633
      6 c      z      -3.32e-2 5.49e-2 -0.604 0.546
    # ... with 2 more variables: lower <dbl>,
    #   upper <dbl>
    

    And plot nicely

    region_marginals %>% 
      ggplot(aes(reorder(factor, AME), AME, ymin = lower, ymax = upper)) +
      geom_hline(yintercept = 0, colour = "#AAAAAA") +
      geom_pointrange() +
      facet_wrap(~region) +
      coord_flip()
    

    enter image description here