Search code examples
rgtsummary

R gtsummary tbl_summary with strata and two independent grouping variables


I'm trying to create a tbl_summary that has a strata category, and within each strata category, two seperate categorical (likely binary) variables. Here's an example of how I want the table to be laid out, however, the n/% for d are placeholders and not true to my example dataset.

example table structure

I don't want to combine variables b and c as these are distinct and independent variables per observation. I've attempted to achieve my desired table structure with a combination of tbl_summary, tbl_strata, and tbl_merge, however I can't manage to get it working correctly.

Here's my minimal example (R v4.2.2):

library(readr)
library(dplyr)
library(tidyverse)
library(gtsummary)

df <- data.frame(id=1:10,
                 a=c('red', 'blue', 'red', 'red', 'blue', 'red', 'blue', 'blue', 'blue', 'red'),
                 b=c('yes', 'yes', 'yes', 'no', 'yes', 'no', 'yes', 'yes', 'yes', 'no'),
                 c=c('cheese', 'cheese', 'steak', 'steak', 'cheese', 'steak', 'steak', 'cheese', 'steak', 'steak'),
                 d=c(22, 82, 44, 56, 27, 61, 22, 19, 38, 47)
)

df$a <- factor(df$a)
df$b <- factor(df$b)
df$c <- factor(df$c)
df$d <- factor(df$d)

t1 <- df %>%
   select(a, b, d) %>%
   mutate(a = paste("a=", a)) %>%
   mutate(b = paste("b=", b)) %>%
   tbl_strata(
      strata = a,
      .tbl_fun =
         ~ .x %>%
         tbl_summary(by = b, missing = "no"),
      .header = "**{strata}**, N = {n}"
   )

t2 <- df %>%
   select(a, c, d) %>%
   mutate(a = paste("a=", a)) %>%
   mutate(c = paste("c=", c)) %>%
   tbl_strata(
      strata = a,
      .tbl_fun =
         ~ .x %>%
         tbl_summary(by = c, missing = "no"),
      .header = "**{strata}**, N = {n}"
   )

tbl_merge(
   tbls = list(t1, t2),
   tab_spanner = c("**b**", "**c**")
)

This code produces this table, which doesn't have the correct columns for b and c, and is missing the overall strata variable a.

code output

Some further attempts have produced the right layout, but the N/% are incorrect and duplicated between the strata values:

df <- data.frame(id=1:11,
                 a=c('red', 'blue', 'red', 'red', 'blue', 'red', 'blue', 'blue', 'blue', 'red', 'blue'),
                 b=c('yes', 'yes', 'yes', 'no', 'yes', 'no', 'yes', 'yes', 'yes', 'no', 'no'),
                 x=c('cheese', 'cheese', 'steak', 'steak', 'cheese', 'steak', 'steak', 'cheese', 'steak', 'steak', 'cheese'),
                 d=c(22, 82, 44, 56, 27, 61, 22, 19, 38, 47, 38)
)

df$a <- factor(df$a)
df$b <- factor(df$b)
df$x <- factor(df$x)
df$d <- factor(df$d)

t3 <- df %>%
   select(b, d) %>%
   mutate(b = paste("b=", b)) %>%
   tbl_summary(by = b, 
               missing = "no"
   )

t4 <- df %>%
   select(x, d) %>%
   mutate(x = paste("x=", x)) %>%
   tbl_summary(by = x, 
               missing = "no"
   )

df %>% tbl_strata(
   strata = a,
   .tbl_fun =
      ~tbl_merge(
         tbls = list(t3, t4)
      ),
   .header = "**a={strata}**, N = {n}"
)

I moved the tbl_strata from each table to instead happen on the merged version, and passed the tbl_merge (which isn't ~ .x %>%) to tbl_strata.

Refinement but incorrect

This is close, if I can fix the values for d which are incorrectly duplicated between values of a.


Solution

  • Hope this is what you're after!

    library(gtsummary)
    packageVersion("gtsummary")
    #> [1] '1.7.2'
    
    df <- data.frame(id=1:10,
                     a=c('red', 'blue', 'red', 'red', 'blue', 'red', 'blue', 'blue', 'blue', 'red'),
                     b=c('yes', 'yes', 'yes', 'no', 'yes', 'no', 'yes', 'yes', 'yes', 'no'),
                     c=c('cheese', 'cheese', 'steak', 'steak', 'cheese', 'steak', 'steak', 'cheese', 'steak', 'steak'),
                     d=c(22, 82, 44, 56, 27, 61, 22, 19, 38, 47)
    )
    
    df$a <- factor(df$a)
    df$b <- factor(df$b)
    df$c <- factor(df$c)
    df$d <- factor(df$d)
    
    # first create a function to create half of the table
    tbl_summary_half_merge <- function(data, by, include) {
      purrr::map(
        by, 
        ~tbl_summary(data, by = all_of(.x), include = all_of(include)) |> 
          modify_header(all_stat_cols() ~ paste0("**", .x, " = {level}**"))
      ) |> 
        tbl_merge(tab_spanner = FALSE)
    }
    
    # testing our first function
    tbl_summary_half_merge(df, by = c("b", "c"), include = "d") |> as_kable()
    
    Characteristic b = no b = yes c = cheese c = steak
    d
    19 0 (0%) 1 (14%) 1 (25%) 0 (0%)
    22 0 (0%) 2 (29%) 1 (25%) 1 (17%)
    27 0 (0%) 1 (14%) 1 (25%) 0 (0%)
    38 0 (0%) 1 (14%) 0 (0%) 1 (17%)
    44 0 (0%) 1 (14%) 0 (0%) 1 (17%)
    47 1 (33%) 0 (0%) 0 (0%) 1 (17%)
    56 1 (33%) 0 (0%) 0 (0%) 1 (17%)
    61 1 (33%) 0 (0%) 0 (0%) 1 (17%)
    82 0 (0%) 1 (14%) 1 (25%) 0 (0%)
    
    # now use that function with tbl_strata()
    tbl <-
      tbl_strata(
        df, 
        strata = "a",
        .tbl_fun = 
          ~tbl_summary_half_merge(.x, by = c("b", "c"), include = "d"),
        .header = "**a = {strata}**"
      )
    

    enter image description here

    Created on 2024-04-30 with reprex v2.1.0