Search code examples
rgtsummary

Simplifying dichotomous variables in gtsummary


Apologies as I'm sure this has been asked before, but I haven't been able to find it.

In gtsummary::tbl_summary(), when I have a dichotomous variable (e.g., male vs. female sex), I would like the generated table to print only the variable with the highest percentage as would be typical of a final table (i.e., printing male: 47%, female: 53% is redundant).

For example, I would like the following code to generate a table that has 'Male sex': 6 (60%) instead of listing both male and female

library(tibble)
library(gtsummary)

set.seed(123) 
study_id <- 1:10
sex <- sample(c("Male", "Female"), 10, replace = TRUE)
age <- sample(18:65, 10, replace = TRUE)

df <- tibble(study_id, sex, age)

df %>% tbl_summary()

This is obviously easy to edit in Word, but would rather not.

Thanks


Solution

  • This may get you started

    df |> tbl_summary(type = list(sex ~ "dichotomous"), 
                      value = list(sex ~ "Male"), 
                      label = list(sex ~ "Male"))
    

    If you want R to pick up the most frequent category:

    pick_freq <- names(which.max(table(df$sex)))
    df |> tbl_summary(type = list(sex ~ "dichotomous"), 
                      value = list(sex ~ pick_freq), 
                      label = list(sex ~ pick_freq))