I have a dummy dataset df
which contains data on the treatment, disease grade and survival outcome of 50 people.
library(Hmisc)
# Generate initial cases.
set.seed(123)
n <- 50
df <- data.frame(
treatment = sample(0:1, n, replace = TRUE),
grade = sample(0:1, n, replace = TRUE),
death = sample(0:1, n, replace = TRUE)
)
# Add labels to columns.
label(df$treatment) <- "Drug Intervention"
label(df$grade) <- "Cancer Grade"
label(df$death) <- "Death Occurrence"
# Factor the values.
df$treatment <- factor(df$treatment, levels = c(0, 1), labels = c("Drug A", "Drug B"))
df$grade <- factor(df$grade, levels = c(0, 1), labels = c("Grade 1", "Grade 2"))
df$death <- factor(df$death, levels = c(0, 1), labels = c("Survived", "Died"))
I want to generate a 2 x 3 cross table which shows the number of people who survived by grade and treatment. I am using tbl_strata()
and tbl_summary()
from gtsummary to do this.
This code is getting close to the desired outcome:
library(tidyverse)
library(gtsummary)
df %>%
tbl_strata(
strata = grade,
~.x %>%
tbl_summary(
by = death,
percent = "row"
))
Which produces a plot that looks like this:
Characteristic | Grade 1 Survived, N = 10 | Grade 1 Died, N = 17 | Grade 2 Survived, N = 9 | Grade 2 Died, N = 14 |
---|---|---|---|---|
Treatment | ||||
Drug A | 5 (28%) | 13 (72%) | 5 (42%) | 7 (58%) |
Drug B | 5 (56%) | 4 (44%) | 4 (36%) | 7 (64%) |
However the desired output is:
Treatment | Grade 1 Died | Grade 2 Died | P-Value |
---|---|---|---|
Drug A | 13 (72%) | 7 (58%) | 0.46 |
Drug B | 4 (44%) | 7 (64%) | 0.65 |
How can I use gtsummary to filter out/collapse the 'Survived` columns to simplify the table, and is it possible to add a Fisher's exact p-value for the relationship between death and grade (for the two separate drugs)?
There are multiple ways to accomplish this. You could filter out individuals who survived since you're only interested in death occurrences, then use tbl_strata() to stratify by treatment rather than grade. Finally, inside tbl_strata(), you can use tbl_summary() to summarize by grade and calculate percentages and use add_p() for your p-values (or whichever method of p-value calculation you prefer).
Here is an example of the structure I am thinking of:
df_filtered <- df %>%
filter(death == "Died")
table_output <- df_filtered %>%
tbl_strata(
strata = treatment,
.f = ~ .x %>%
tbl_summary(
by = grade,
missing = "no"
) %>%
add_n() %>% # Add counts
modify_header(label = "**Grade**") %>% # Modify the header
add_p(test = list(all_continuous() ~ "fisher.test")) # Add Fisher's exact test p-value
)
print(table_output)
Another option is to use tbl_strata(strata = treatment)
to stratify by treatment, and create separate tables for Drug A and Drug B.
To get your count and percentage format, you can use statistic = list(all_categorical() ~ "{n} ({p}%)")
.
And you can specify Fisher's exact explicitly as well: add_p(test = all_categorical() ~ "fisher")
.
This way, your code might look something like this:
df %>%
tbl_strata(
strata = treatment,
~.x %>%
tbl_summary(
by = grade,
statistic = list(all_categorical() ~ "{n} ({p}%)"),
missing = "no"
) %>%
add_p(test = all_categorical() ~ "fisher.test")
)
You can further modify various aspects of formatting like this:
%>% modify_header(stat_by = "Grade")
%>% as_gt() %>%
tbl_caption(
caption = "Captions"
)