Search code examples
rtidyversepurrrgtsummarygt

create multiple cross tables with one-line code function with gtsummary


i'm having the following problem:

Context: I'm using gtsummary to explore frequencies in a dataframe using cross variables.

Here's my desire output: desire table

So that i have a main variable tobgp and its cross by multiple variables like agegp and algp

Attempt: this is what i've done so far. Using the esoph data from the package The R Datasets Package (datasets).

pacman::p_load(tidyverse, gt, gtsummary)

multiple_table<-function(data, var){

t0<- data %>% 
  select({{var}}) %>% 
  gtsummary::tbl_summary(statistic = all_categorical()~ "{p}% ({n})",
                         digits = list(everything() ~ c(2, 0))) %>%
  modify_header(label ~ "") %>% 
  bold_labels()

#agep
t1<-data %>% 
  select({{var}}, agegp) %>% 
  gtsummary::tbl_summary(by = agegp, statistic = all_categorical()~ "{p}% ({n})",
                         digits = list(everything() ~ c(2, 0)))


#alcgp
t2<-data %>% 
  select({{var}}, alcgp) %>% 
  gtsummary::tbl_summary(by = alcgp, statistic = all_categorical()~ "{p}% ({n})",
                         digits = list(everything() ~ c(2, 0)))

#MERGE
tbl_merge(tbls = list(t0,t1,t2),
          tab_spanner = c("**Total**", "**agegp**", "**algp**")) %>%
  as_gt() %>% 
  gt::tab_source_note(gt::md("*Fuente: Empresa1*"))

}

esoph %>% 
  multiple_table(tobgp)

The problem with my code so far is that is specific for the crosses, to add more cross variables i have to modify the function i created which is not so friendly.

Request: Create a function so that you can create the desire output with one line of code. Like this for example:

multiple_table(data, main, by)    

esoph %>%
    multiple_table(main=tobgp, by=c(agegp, algp)

So that if i want to use other variables to cross by i only have to change the by=c() argument. In order to be easy to do something like:

esoph %>%
    multiple_table(main=tobgp, by=c(agegp, algp, variable1, variable2)

Notes:

  • I've tried other functions inside gtsummary like tbl_strata which can use two variables as crosses, but doesn't suit my needs because it mixes the two cross variables like this: the table i don't want

This is not what i'm looking for. As you can see, Grade divides the percentage of Drug test by each Grade. This example is taken from gtsummary vignette: https://www.danieldsjoberg.com/gtsummary/reference/tbl_strata.html

  • I think the solution for my problem could involve some workaround with purrr, or apply, i've tried some but i'm not very good using lists and iterations.

That's it. Thanks very much for listening and i hope i've been very clear. If not, feel free to ask.

Answers 28/03/22

Since i posted my question i've recieve to different approach answers which both work perfectly. Feel free to use the one that suits you. Thanks Mike for the answer in StackOverflow and thanks Tan, June C, Tyler Grant Smith for the answer in the Slack R4DS Community. In my case i would stick with the approach 3.

Approach 1: The Mike approach

library(gtsummary)
library(dplyr)
esoph <- mutate(esoph,
                ncases = ifelse(ncases > 2, "High","Low"))

multiple_table<-function(data, var, vars){

  t0 <- data %>%
    select( var  ) %>%
    gtsummary::tbl_summary(statistic = all_categorical()~ "{p}% ({n})",
                           digits = list(everything() ~ c(2, 0))) %>%
    modify_header(label ~ "") %>%
    bold_labels()


  tlist <-  lapply(vars,function(y){
    data %>%
      select( var  ,  y  ) %>%
      gtsummary::tbl_summary(by =  y  , statistic = all_categorical()~ "{p}% ({n})",
                             digits = list(everything() ~ c(2, 0)))
  })

  tabspannername <- c("**Total**", paste0("**",vars,"**"))

  tlist2 <- append(list(t0), tlist,1)


  tbl_merge(tbls = tlist2
            ,tab_spanner = tabspannername
  ) %>%
    as_gt() %>%
    gt::tab_source_note(gt::md("*Fuente: Empresa1*"))
}

multiple_table(data = esoph, var = "tobgp", vars = c("agegp", "alcgp","ncases"))

Approach 2: The Tan approach

library(tidyverse)
library(gt)
library(gtsummary)

esoph

fn_subtable <- function(data, main, sub){
  data %>%
    dplyr::select({{main}},{{sub}}) %>%
    gtsummary::tbl_summary(
      by = {{sub}},
      statistic = gtsummary::all_categorical()~ "{p}% ({n})",
      digits = list(dplyr::everything() ~ c(2, 0)))

}

fn_table <-function(data, main_var, sub_vars){

  t0 <- data %>%
    dplyr::select({{main_var}}) %>%
    gtsummary::tbl_summary(statistic = gtsummary::all_categorical() ~ "{p}% ({n})",
                           digits = list(dplyr::everything() ~ c(2, 0))) %>%
    gtsummary::modify_header(label ~ "") %>%
    gtsummary::bold_labels()

  sub_tables <- purrr::map(sub_vars, ~fn_subtable(data = data, main = main_var, sub = .x))

  #MERGE
  tbls <- c(list(t0), sub_tables) %>%
    gtsummary::tbl_merge(tab_spanner = c("**Total**", paste0("**",sub_vars,"**"))) %>%
    gtsummary::as_gt() %>%
    gt::tab_source_note(gt::md("*Fuente: Empresa1*"))

  tbls

}

esoph %>% fn_table("tobgp", c("agegp", "alcgp"))

Approach 3: The June C - Tyler Grant Smith approach

library(tidyverse)
library(gt)
library(gtsummary)

fn_subtable <- function(data, main, sub){
  data %>% 
    dplyr::select({{main}},{{sub}}) %>% 
    gtsummary::tbl_summary(
      by = {{sub}}, 
      statistic = gtsummary::all_categorical()~ "{p}% ({n})",
      digits = list(dplyr::everything() ~ c(2, 0)))
  
}

fn_table3 <- function(data, main_var, sub_vars){
  
  main_var <- rlang::enexpr(main_var)
  sub_vars_expr <- rlang::enexpr(sub_vars)         # 1. Capture `list(...)` call as expression
  sub_vars_args <- rlang::call_args(sub_vars_expr) # 2. Pull out the arguments (they're now also exprs)
  sub_vars_fn   <- rlang::call_fn(sub_vars_expr)   # 3. Pull out the fn call
  # 4. Evaluate the fn with expr-ed arguments (this becomes `list( expr(agegp), expr(alcgp) )` )
  sub_vars_reconstructed <- rlang::exec(sub_vars_fn, !!!sub_vars_args)
  
  # --- sub_vars replaced with sub_vars_reconstructed from here onwards ---
  
  t0 <- data %>% 
    dplyr::select({{main_var}}) %>% 
    gtsummary::tbl_summary(statistic = gtsummary::all_categorical() ~ "{p}% ({n})",
                           digits = list(dplyr::everything() ~ c(2, 0))) %>%
    gtsummary::modify_header(label ~ "") %>% 
    gtsummary::bold_labels()
  
  sub_tables <- purrr::map(sub_vars_reconstructed, ~fn_subtable(data = data, main = main_var, sub = .x))
  
  tbls <-  c(list(t0), sub_tables) %>% 
    gtsummary::tbl_merge(tab_spanner = c("**Total**", paste0("**",sub_vars_reconstructed,"**"))) %>%
    gtsummary::as_gt() %>% 
    gt::tab_source_note(gt::md("*Fuente: Empresa1*"))
  
  tbls
  
}

fn_table3(esoph,tobgp,list(agegp,alcgp))

Thanks very much and i hope this could be implemented as a function inside the gtsummary package because is very useful to explore frequencies with different cross variables.


Solution

  • you are pretty close and only needed a few modifications. the major change is adding in an lapply() to loop through the vars input to create a list of tbl_summary objects. Then I create the tab spanner names from the inputs of vars and append the t0 table to the list created by the lapply(). then you can pass tlist2 to tbl_merge() with the names created with tabspannername to dynamically label the tables.

    library(gtsummary)
    library(dplyr)
    esoph <- mutate(esoph,
                    ncases = ifelse(ncases > 2, "High","Low"))
    
    multiple_table<-function(data, var, vars){
    
      t0 <- data %>% 
        select( var  ) %>% 
        gtsummary::tbl_summary(statistic = all_categorical()~ "{p}% ({n})",
                               digits = list(everything() ~ c(2, 0))) %>%
        modify_header(label ~ "") %>% 
        bold_labels()
      
      
      tlist <-  lapply(vars,function(y){
        esoph %>% 
          select( var  ,  y  ) %>% 
          gtsummary::tbl_summary(by =  y  , statistic = all_categorical()~ "{p}% ({n})",
                                 digits = list(everything() ~ c(2, 0)))
      })
      
     tabspannername <- c("**Total**", paste0("**",vars,"**"))
      
     tlist2 <- append(list(t0), tlist,1)
     
     
     tbl_merge(tbls = tlist2
                ,tab_spanner = tabspannername
               ) %>%
       as_gt() %>% 
       gt::tab_source_note(gt::md("*Fuente: Empresa1*"))
    }
    
    
    x <-  multiple_table(data = esoph, var = "tobgp", vars = c("agegp", "alcgp","ncases"))