Search code examples
rsummarygtsummarytbl

R gtsummary Row with Categorical Variable Totals


I have a dataset of approximately 700,000 patients where I have hospital site IDs (factor variable). I would like to create a row where the number of hospitals is visible (this is separate to the number of patients). I have 3 categorical variables as my columns in addition to an overall column.

At the moment, there is a separate row for each hospital id with the number of patients in each site for each category.

My code is as follows:

t1 <- PIR %>% 
  select(siteidn, countryname) %>% 
    tbl_summary(by = countryname ,missing = "no",
                label = list(
                 siteidn = "Number of ICUs"),
            statistic = list(
              all_continuous() ~ "{mean} ({sd})",
              all_categorical() ~ "{n} ({p}%)")) %>%
  bold_labels() %>% 
  italicize_levels() %>% 
  add_overall()

t2 <- PIR %>% 
  select(siteidn, hospt) %>% 
    tbl_summary(by = hospt ,missing = "no",
                label = list(
                 siteidn = "Number of ICUs"),
            statistic = list(
              all_continuous() ~ "{mean} ({sd})",
              all_categorical() ~ "{n} ({p}%)")) %>% 
      bold_labels() %>% 
      italicize_levels()

t3 <- PIR %>% 
  select(siteidn, iculevelname) %>% 
    tbl_summary(by = iculevelname ,missing = "no",
                label = list(
                 siteidn = "Number of ICUs"),
            statistic = list(
              all_continuous() ~ "{mean} ({sd})",
              all_categorical() ~ "{n} ({p}%)")) %>% 
      bold_labels() %>% 
      italicize_levels()

tbl_merge(
  tbls = list(t1, t2, t3),
  tab_spanner = c("**Country**", "**Hospital Type**", "**ICU Level**"))

This produces the following table:

Table 1

As can be seen, there is a separate row for each hospital ID. I'd like to have a single row where there are totals of the number of hospitals in each tier (i.e. total number of hospitals in Aus, NZ, Metropolitan, etc).

My questions are:

  1. Is there a way to get a total row for a factor variable that is not the patient number?
  2. Is it possible to have an overall column inserted after merging the tables (so that the overall column does not come under the Country heading)?
  3. Is there a way to create a row for the number of patients and not have those details in the headings?

Thanks all for your time.

Ben

ADDIT: Here is an image of what I would like the table to look like. I apologise for it's crudeness. I would like to have just one row for the factor variable of total Number of ICUs, rather than have a row of each ICU with the number of patients in it (Red Ink).

Additionally, is there a way to group the 2 rows under a common heading similar to the factor variables (Green Ink).

I appreciate that my R skills are rudementary. Thank you all for your patience!

Ben

Table_Ideal


Solution

  • I agree with Ben, always good to include a dataset we can run on our machine, and an example of what you would like the output look like. Below is a code example that adresses most of your questions.

    1. Is there a way to get a total row for a factor variable that is not the patient number?

    I am not sure what you're looking for here. More details please.

    1. Is it possible to have an overall column inserted after merging the tables (so that the overall column does not come under the Country heading)?

    Yes, you can use the modify_spanning_header() function to remove the header above the Overall column.

    1. Is there a way to create a row for the number of patients and not have those details in the headings?

    Yes, if you create a new column in your dataset that is TRUE for all observations, we can summarize that column and report the N.

    Also, if you're only doing cross tabulations of a single variable, you should look into the tbl_cross() function. It adds the total rows automatically.

    library(gtsummary)
    library(tidyverse)
    set.seed(20210108)
    
    # create dummy dataset
    PIR <- 
      tibble(
        siteidn = sample(c("1325", "1324", "1329"), 100, replace = TRUE) %>% factor(),
        countryname = sample(c("NZ", "Australia"), 100, replace = TRUE) %>% factor(),
        hospt = sample(c("Metro", "Rural"), 100, replace = TRUE) %>% factor(),
        patient = TRUE
      ) %>%
      group_by(siteidn) %>%
      mutate(
        count_site = row_number() == 1L # one TRUE per site
      ) %>%
      ungroup() %>%
      labelled::set_variable_labels(siteidn = "Number of ICUs", # Assigning labels 
                                    patient = "N")
    
    t1 <- PIR %>% 
      select(patient, siteidn, countryname) %>% 
      tbl_summary(
        by = countryname,
        missing = "no", 
        statistic = patient ~ "{n}" # only print N for the top row
      ) %>% 
      modify_header(stat_by = "**{level}**") %>% # Remove the Ns from the header row
      add_overall(col_label = "**Overall**")
    t2 <- PIR %>% 
      select(patient, siteidn, hospt) %>% 
      tbl_summary(
        by = hospt,
        missing = "no", 
        statistic = patient ~ "{n}" # only print N for the top row
      ) %>%
      modify_header(stat_by = "**{level}**") # Remove the Ns from the header row
    
    tbl <-
      tbl_merge(
        tbls = list(t1, t2),
        tab_spanner = c("**Country**", "**Hospital Type**")
      ) %>%
      bold_labels() %>% 
      italicize_levels() %>%
      # remove spanning header for overall column, use `show_header_names(tbl)` to print column names
      modify_spanning_header(stat_0_1 ~ NA) %>%
      modify_footnote(everything() ~ NA) # remove footnote, as it's not informative in this setting
    

    enter image description here

    EDIT: After clarification from original poster, adding another example of how one could present the Ns.

    The table below shows two ways to show the Ns for the patients and the number of sites. The first is on two lines with two variables, and the last line is a way the information can be presented on a single line.

    t1 <- PIR %>% 
      select(patient, site_only = count_site, combination = count_site, countryname) %>% 
      tbl_summary(
        by = countryname,
        missing = "no", 
        statistic = list(c(patient, site_only) ~ "{n}", 
                         combination ~ "Site N {n}; Total N {N}")
      )
    

    enter image description here