Search code examples
rregressiontidyrpurrrgtsummary

Using `tbl_regression` for multivariable regression on nested data frames


I am doing multivariable regression on a list of outcome variables with a consistent set of independent variables. For univariable regression, I have followed this example to use tl_uvregression from gtsummary on a nested data frame, but I am trying to generalize this to multivariable regression using tbl_regression on a nested data frame, and when I try to unnest the tables, I get the error that "the input must be a list of vectors." Below is what I have tried - I assume there's some small but critical step that I'm missing, but I can't figure out what it is. My desired output is a table of multivariable regression output, with each model as a column and all the covariates as rows (similar to performing tbl_merge on a list of each of these multivariable models run separately in tbl_regression).

library(tidyverse)
library(magrittr)
library(gtsummary)
library(broom)

id <- 1:2000
gender <- sample(0:1, 2000, replace = T)
age <- sample(17:64, 2000, replace = T)
race <- sample(0:1, 2000, replace = T)
health_score <- sample(0:25, 2000, replace = T)
cond_a <- sample(0:1, 2000, replace = T)
cond_b <- sample(0:1, 2000, replace = T)
cond_c <- sample(0:1, 2000, replace = T)
cond_d <- sample(0:1, 2000, replace = T)
df <- data.frame(id, gender, age, race, health_score, cond_a, cond_b, cond_c, cond_d)

regression_tables <- df %>% select(-id) %>% 
  gather(c(cond_a, cond_b, cond_c, cond_d), key = "condition", value = "case") %>% 
  group_by(condition) %>% nest() %>% 
  mutate(model = map(data, ~glm(case ~ gender + age + race + health_score, family = "binomial", data = .)),
         table = map(model, ~tbl_regression, exponentiate = T, conf.level = 0.99)) %>% 
  select(table) %>% unnest(table)

Solution

  • The issue seems to be the use of lambda expression (~) and without making use of it i.e specifying the arguments. Also, there are no tidy methods available (from broom) to extract into a tibble format from tbl_regression

    library(dplyr)
    library(tidyr)
    library(broom)
    library(gtsummary)
    
    out <-  df %>% 
       select(-id) %>% 
     gather(c(cond_a, cond_b, cond_c, cond_d), key = "condition", 
        value = "case") %>% 
     group_by(condition) %>% 
     nest() %>% 
      mutate(model = map(data,  
        ~glm(case ~ gender + age + race + health_score, 
            family = "binomial", data = .)),
      table = map(model, tbl_regression, exponentiate = T, conf.level = 0.99)) %>%
      select(table)
    
    out$table[[1]]
    

    enter image description here


    In addition to the OP's method of using map to loop over, in fact, we could simply apply the model, tbl_regression after nest_by (replaced the gather with pivot_longer as gather would get deprecated, pivot_longer is a generalized version)

    out <-  df %>%
        select(-id) %>% 
        pivot_longer(cols = starts_with('cond'), 
           names_to = 'condition', values_to = 'case') %>% 
        nest_by(condition) %>% 
        mutate(model = list(glm(case ~ gender + age + 
          race + health_score, 
            family = "binomial", data = data)), 
    table_out = list(tbl_regression(model, exponentiate = TRUE, conf.level = 0.99)))
    
    out
    # A tibble: 4 x 4
    # Rowwise:  condition
    #  condition               data model  table_out 
    #  <chr>     <list<tbl_df[,5]>> <list> <list>    
    #1 cond_a           [2,000 × 5] <glm>  <tbl_rgrs>
    #2 cond_b           [2,000 × 5] <glm>  <tbl_rgrs>
    #3 cond_c           [2,000 × 5] <glm>  <tbl_rgrs>
    #4 cond_d           [2,000 × 5] <glm>  <tbl_rgrs>
    

    If we need a merged table, apply the tbl_merge on the list of tbl_regression

    tbl_merge(out$table_out)
    

    -output

    enter image description here