Search code examples
rdplyrstatisticsgtsummary

Add Percentage to Grouping Variable in gtsummary Package


I am in the process of creating gtsummary table based on the mortality status (the variable "fate") of the Bernard data included in the pubh package.

The issue I am facing is that I want to add the percentage of "Dead" and "Alive" next to their count. But since this is the grouping variable, I haven't been able to configure it

This is my sample code for the table:

library(pubh)
library(dplyr)
library(gtsummary)

data("Bernard")


na.omit(Bernard)  %>% select(fate, race, apache) %>%
  tbl_summary(by = fate,
              
     type =  list(race ~ "categorical", apache ~ "continuous"),
     statistic = list(all_continuous() ~ "{min}, {max}", all_categorical() ~ "{p}%"),
     digits = list(all_continuous() ~ 2, all_categorical() ~ 2),
     missing_text = "(Missing)" ) %>% 
                
     add_stat_label() %>%
     modify_header(label ~ "**Variable**") %>%
     modify_caption("**Table 1. Summary statistics by  Mortality Status**") %>%
     modify_spanning_header(c("stat_1", "stat_2") ~ "**Fate**") %>%
     bold_labels() %>%
     italicize_labels() %>%
     italicize_levels() 

And this is the output:

Rendered table

Ideally, I would like to have the table show:

Alive, N = 96 (67%) Dead, N = 47 (32%)

I have tried listing the fate variable as categorical and then providing the statistic for percentage:

type =  list(c(race, fate) ~ "categorical", apache ~ "continuous"),   
statistic = list(all_continuous() ~ "{min}, {max}", all_categorical() ~ "{p}%", **fate ~ "{p}%"**),

This did not work.

And I was also thinking that using mutate to create a new variable before using tbl_summary() would probably work, but I am curious if this can be configured explicitly within tbl_summary().


Solution

  • You can add the percentage to the header using the modify_header() function. Example below!

    library(gtsummary)
    packageVersion("gtsummary")
    #> [1] '1.6.3'
    
    trial %>%
      tbl_summary(
        by = trt, 
        include = age
      ) %>%
      modify_header(all_stat_cols() ~ "**{level}**, N={n} ({style_percent(p)}%)") %>%
      as_kable() # convert to kable to display on stackoverflow
    
    Characteristic Drug A, N=98 (49%) Drug B, N=102 (51%)
    Age 46 (37, 59) 48 (39, 56)
    Unknown 7 4

    Created on 2022-12-24 with reprex v2.0.2