Search code examples
rgtsummaryr-cards

How to round ALL figures to the nearest 5 using tbl_summary in R?


I am only allowed to output data rounded to the nearest 5. I have figured out how to do it for the categorical data rows, however the heading and percentages are unchanged.

Example code:

mpg %>%
  select(manufacturer, drv) %>%
  tbl_summary(by = drv,
              digits=list(all_categorical() ~ c(function(x){round(x/5) * 5}, 0)),
              type = list(displ = "continuous2"))

Output of code above

I want, instead of (e.g.) N=103 to display N=105. I'd also like the percentages to correspond to the rounded values.

Thanks!


Solution

  • @Edward's response is great! If you want to update your table to have percentages that are calculated based on the rounded n's in the table, you'll need to take a step back from tbl_summary(). The tbl_summary() function uses the {cards} package to perform all tabulations, and that is where we'll need to go to modify the method that the percentage is calculated.

    In the example below, we first calculate an analysis result dataset (ARD) using cards, then pass that ARD to tbl_ard_summary() to build the table. It's a bit complex, but I am not sure of a simpler way!

    library(gtsummary)
    library(cards)
    
    ard <-
      # calculate counts and big N
      ard_categorical(
        ggplot2::mpg,
        variables = manufacturer,
        by = drv,
        statistic = ~c("n", "N"),
        # round little n to the nearest 5
        fmt_fn = ~list(n = function(x) round(x / 5) * 5)
      ) |> 
      # add the percentage using n rounded to the nearest 5
      cards::add_calculated_row(
        expr = round(n / 5) * 5 / N,
        stat_name = "p",
        stat_label = "%",
        fmt_fn = label_style_percent()
      ) |>
      # add calculations for the continuous summaries
      bind_ard(
        ard_stack(
          ggplot2::mpg,
          .by = drv,
          ard_continuous(variables = displ)
        )
      )
    
    print(ard, n = 3) # only print the first three rows
    #> {cards} data frame: 168 x 11
    #>   group1 group1_level     variable variable_level stat_name stat_label stat
    #> 1    drv            4 manufacturer           audi         n          n   11
    #> 2    drv            4 manufacturer           audi         N          N  103
    #> 3    drv            f manufacturer           audi         n          n    7
    #> ℹ 165 more rows
    #> ℹ Use `print(n = ...)` to see more rows
    #> ℹ 4 more variables: context, fmt_fn, warning, error
    
    # use the ARD-first workflow to create the table
    ard |> 
      tbl_ard_summary(
        by = drv,
        statistic = all_categorical() ~ "{n} / {N} ({p}%)" # showin the big N to illustrate the percentage calculation is correct
      ) |> 
      modify_header(all_stat_cols() ~ "**{level}**  \n N = {round(n/5)*5}") |> 
      # print table as kable so it renders on stackoverflow
      bold_labels() |> 
      as_kable()
    
    Characteristic 4 N = 105 f N = 105 r N = 25
    manufacturer
    audi 10 / 103 (9.7%) 5 / 106 (4.7%) 0 / 25 (0%)
    chevrolet 5 / 103 (4.9%) 5 / 106 (4.7%) 10 / 25 (40%)
    dodge 25 / 103 (24%) 10 / 106 (9.4%) 0 / 25 (0%)
    ford 15 / 103 (15%) 0 / 106 (0%) 10 / 25 (40%)
    honda 0 / 103 (0%) 10 / 106 (9.4%) 0 / 25 (0%)
    hyundai 0 / 103 (0%) 15 / 106 (14%) 0 / 25 (0%)
    jeep 10 / 103 (9.7%) 0 / 106 (0%) 0 / 25 (0%)
    land rover 5 / 103 (4.9%) 0 / 106 (0%) 0 / 25 (0%)
    lincoln 0 / 103 (0%) 0 / 106 (0%) 5 / 25 (20%)
    mercury 5 / 103 (4.9%) 0 / 106 (0%) 0 / 25 (0%)
    nissan 5 / 103 (4.9%) 10 / 106 (9.4%) 0 / 25 (0%)
    pontiac 0 / 103 (0%) 5 / 106 (4.7%) 0 / 25 (0%)
    subaru 15 / 103 (15%) 0 / 106 (0%) 0 / 25 (0%)
    toyota 15 / 103 (15%) 20 / 106 (19%) 0 / 25 (0%)
    volkswagen 0 / 103 (0%) 25 / 106 (24%) 0 / 25 (0%)
    displ 4.0 (2.8, 4.7) 2.4 (2.0, 3.0) 5.4 (4.6, 5.7)

    Created on 2024-12-12 with reprex v2.1.1