How can I create a cross-table by multiple variables in R?

I have a dummy dataset df which contains data on the treatment, disease grade and survival outcome of 50 people.

library(Hmisc)

# Generate initial cases.

set.seed(123)
n <- 50

df <- data.frame(
  treatment = sample(0:1, n, replace = TRUE),
  grade = sample(0:1, n, replace = TRUE),
  death = sample(0:1, n, replace = TRUE)
)

# Add labels to columns.

label(df$treatment) <- "Drug Intervention"
label(df$grade) <- "Cancer Grade"
label(df$death) <- "Death Occurrence"

# Factor the values.

df$treatment <- factor(df$treatment, levels = c(0, 1), labels = c("Drug A", "Drug B"))
df$grade <- factor(df$grade, levels = c(0, 1), labels = c("Grade 1", "Grade 2"))
df$death <- factor(df$death, levels = c(0, 1), labels = c("Survived", "Died"))

I want to generate a 2 x 3 cross table which shows the number of people who survived by grade and treatment. I am using tbl_strata() and tbl_summary() from gtsummary to do this.

This code is getting close to the desired outcome:

library(tidyverse)
library(gtsummary)

df %>% 
  tbl_strata(
    strata = grade,
    ~.x %>%
      tbl_summary(
        by = death,
        percent = "row"
      ))

Which produces a plot that looks like this:

Characteristic	Grade 1 Survived, N = 10	Grade 1 Died, N = 17	Grade 2 Survived, N = 9	Grade 2 Died, N = 14
Treatment
Drug A	5 (28%)	13 (72%)	5 (42%)	7 (58%)
Drug B	5 (56%)	4 (44%)	4 (36%)	7 (64%)

However the desired output is:

Treatment	Grade 1 Died	Grade 2 Died	P-Value
Drug A	13 (72%)	7 (58%)	0.46
Drug B	4 (44%)	7 (64%)	0.65

How can I use gtsummary to filter out/collapse the 'Survived` columns to simplify the table, and is it possible to add a Fisher's exact p-value for the relationship between death and grade (for the two separate drugs)?

Solution

There are multiple ways to accomplish this. You could filter out individuals who survived since you're only interested in death occurrences, then use tbl_strata() to stratify by treatment rather than grade. Finally, inside tbl_strata(), you can use tbl_summary() to summarize by grade and calculate percentages and use add_p() for your p-values (or whichever method of p-value calculation you prefer).

Here is an example of the structure I am thinking of:

df_filtered <- df %>% 
  filter(death == "Died")
table_output <- df_filtered %>%
  tbl_strata(
    strata = treatment,
    .f = ~ .x %>%
      tbl_summary(
        by = grade,
        missing = "no"
      ) %>%
      add_n() %>% # Add counts
      modify_header(label = "**Grade**") %>% # Modify the header
      add_p(test = list(all_continuous() ~ "fisher.test")) # Add Fisher's exact test p-value
  )
print(table_output)

Another option is to use tbl_strata(strata = treatment) to stratify by treatment, and create separate tables for Drug A and Drug B.

To get your count and percentage format, you can use statistic = list(all_categorical() ~ "{n} ({p}%)").

And you can specify Fisher's exact explicitly as well: add_p(test = all_categorical() ~ "fisher").

This way, your code might look something like this:

df %>%
  tbl_strata(
    strata = treatment,
    ~.x %>%
      tbl_summary(
        by = grade,
        statistic = list(all_categorical() ~ "{n} ({p}%)"),
        missing = "no"
      ) %>%
      add_p(test = all_categorical() ~ "fisher.test")
  )

You can further modify various aspects of formatting like this:

   %>% modify_header(stat_by = "Grade")
   %>% as_gt() %>%  
       tbl_caption(
         caption =  "Captions"
       )