Search code examples
rr-markdownconfidence-intervalgtsummary

How to include confidence intervals for proportion using {gtsummary} tbl_summary?


New to posting on StackOverflow (but not reading ), so bear over with my skills.

I am using the {gtsummary} package, in particular the tbl_summary function.

I would like to include a 95% confidence interval of the proportions for each of the by and all the included categorical- and continuous variables.

Searching through the previous posts, I haven't found a solution to solve the exact problem.

My basic output is created using the following:

tbl <- df %>% 
  select(group, var_cont, var_cat_1, var_cat_2, var_cat_3, var_cat_4) %>% 
  tbl_summary(
    by = group,
    statistic =
      list(
        all_continuous() ~ "{mean} ({sd})",
        all_dichotomous() ~ "{n}/{N} ({p}%)"
      ),
    missing = "no",
    digits = all_continuous() ~ 1
  )

Based on my data produces the following: tbl_summary output

The majority of my categorical variables are of logical type (i.e. TRUE, FALSE, NA) I would now like to add a column for each of the group-level columns containing a 95% confidence interval of proportions, in the form of "{ci_lower}%, {ci_upper}%"

Of my many attempts, and from inspiration from other posts, I created a custom function that uses the freq_table() function of the {freqtables} package. I made the function so it would fit in the add_stat function of {gtsummary} tbl_summary.

ci_function <- function(data, variable, by, ...) {
  
  variable <- enquo(variable)
  by <- enquo(by)
  
  data %>% 
    freq_table(!!by, !!variable) %>% 
    filter %>% 
    filter(col_cat == TRUE) %>% 
    select(row_cat, col_var, n, n_row, percent_row, lcl_row, ucl_row) %>% 
    mutate(
      lcl_row = format(lcl_row, digits = 2),
      ucl_row = format(ucl_row, digits = 2),
      stat = str_glue("{lcl_row}%, {ucl_row}%")
      ) %>% 
    select(stat) %>% 
    t() %>% 
    as_tibble()  %>% 
    set_names(paste0("add_stat_", seq_len(ncol(.))))  
}

Using the ci_function alone on a selection of the above variables would give me the following:

# A tibble: 1 x 3
  add_stat_1   add_stat_2   add_stat_3  
  <chr>        <chr>        <chr>       
1 0.19%,  9.3% 2.53%, 16.9% 0.34%, 16.3%

When i try to apply the ci_function to the add_stat, by:

tbl <- stack_overflow %>% 
    select(group, var_cont, var_cat_1, var_cat_2, var_cat_3, var_cat_4) %>% 
    tbl_summary(
      by = group,
      statistic =
        list(
          all_continuous() ~ "{mean} ({sd})",
          all_dichotomous() ~ "{n}/{N} ({p}%)"
        ),
      missing = "no",
      digits = all_continuous() ~ 1
    ) %>% 
  add_stat(everything() ~ "ci_function") %>%
  modify_table_body(
    dplyr::relocate, add_stat_1, .after = stat_1
  ) %>%
  modify_header(starts_with("add_stat_") ~ "**95% CI**")

.. I get error messages (expected for the continuous variable):

There was an error for variable 'var_cont':
Error: `nm` must be `NULL` or a character vector the same length as `x`
There was an error for variable 'var_cat_1':
Error: `nm` must be `NULL` or a character vector the same length as `x`
There was an error for variable 'var_cat_2':
Error: `nm` must be `NULL` or a character vector the same length as `x`
There was an error for variable 'var_cat_3':
Error: `nm` must be `NULL` or a character vector the same length as `x`
There was an error for variable 'var_cat_4':
Error: `nm` must be `NULL` or a character vector the same length as `x`

.. and insufficient output tbl_summary output

I am a big fan of the {gtsummary} package and its customization possibilities.

Can anyone help me how to correct my custom function ci_function so that it will work for both categorical and continuous variables, and help me how to implement this function in the add_stat function of {gtsummary}?

Cheers!

Steffen


Solution

  • UPDATE 2022-02-13 Solution now uses the add_ci() function reducing the amount of code significantly.

    Use the add_ci() function to add columns of confidence intervals.

    library(gtsummary)
    packageVersion("gtsummary")
    #> [1] '1.5.2'
    
    tbl <-
      trial %>%
      select(trt, age, response, grade) %>%
      tbl_summary(
        by = trt,
        missing =  "no",
        statistic = list(all_categorical() ~ "{n}/{N} ({p}%)",
                         all_continuous() ~ "{mean} ({sd})")
      ) %>%
      add_ci()
    

    enter image description here Created on 2022-02-13 by the reprex package (v2.0.1)