Search code examples
rgtsummarygt

How to subset variable in gtsummary


I would like to subset a specific variable (not the entire dataset) in gtsummary.

In the following example, how could I subset gear to remove '5' - only show proportion of cars with gear of '3' and '4' ? I would want to include all patients in mpg however.

library(gt)
library(dplyr) 

mtcars %>%
          select(cyl, mpg, gear) %>%
          tbl_summary(
                    by = cyl ### how do i say for gear, filter gear != 5  ???
          )

Solution

  • You'll need to build two separate tables with tbl_summary() then stack them. Example below!

    library(gtsummary)
    packageVersion("gtsummary")
    #> [1] '1.5.0'
    
    tbl_full_data <-
      mtcars %>%
      select(cyl, mpg) %>%
      tbl_summary(by = cyl) %>%
      # removing Ns from header, since they won't be correct for gear
      modify_header(all_stat_cols() ~ "**{level}**")
    
    tbl_gear_subset <-
      mtcars %>%
      select(cyl, gear) %>%
      dplyr::filter(gear != 5) %>%
      tbl_summary(by = cyl) 
    
    # stack tables together
    list(tbl_full_data, tbl_gear_subset) %>%
      tbl_stack() %>%
      as_kable() # convert to kable to it'll print on SO
    #> i Column headers among stacked tables differ. Headers from the first table are
    #> used. Use `quiet = TRUE` to supress this message.
    
    Characteristic 4 6 8
    mpg 26.0 (22.8, 30.4) 19.7 (18.6, 21.0) 15.2 (14.4, 16.2)
    gear
    3 1 (11%) 2 (33%) 12 (100%)
    4 8 (89%) 4 (67%) 0 (0%)

    Created on 2021-10-25 by the reprex package (v2.0.1)