In R, when presenting data (RCT) in a table (preferably gtsummary), how to include tests of significance both within group and between groups? [pic]

I am learning how to analyse the data and present the results of an RCT using R. I have tried reading the package documentation, and searched online but did not find a solution for this. I have 2 groups of participants, and want to express the baseline data for both the groups, change in each group (from baseline to endpoint), and difference between endpoint - all of these for each outcome - in one table. I have attached an example table below.

I have simulated a dataframe, and tried writing a code and will discuss the issues here

ID <- seq(1:50)
data <- data.frame(ID)
data$drug <- rbinom(n = 50, 1, prob = 0.5)
data$drug <- factor(data$drug, levels = c(0, 1), 
                    labels = c("Drug X", "Drug Y"))
data$wt_0 <- rnorm(n = 50, mean = 70, sd = 5)
data$wt_12 <- rnorm(50, 68, 4.9)
head(data)

library(gtsummary)
library(gt)
subset(data, select = -ID) %>%
  tbl_summary(by = drug) %>% 
  add_p()

I tried adding the change in weight column manually

data_new <- data
data$wt_change <- data$wt_0 - data$wt_12
subset(data_new, select = -ID) %>%
  tbl_summary(by = drug) %>% 
  add_p()

I want a table like the one shown at first. And, each row should only correspond to one outcome. Is it feasible using gtsummary() package or any other package in R? It would be great if someone could help because it may be a common scenario

Note: Yes, multiplicity adjustment is not being violated as such, we will state that the other testing (except primary test) is exploratory and should not be interpreted as such

Solution

To get just one row for each variable (weight, BMI, etc.), it may be necessary to use a reshaped data frame:

df <- data %>%
  tidyr::pivot_longer(starts_with("wt"), 
               names_to="week", values_to="weight", names_prefix="wt_")

# A tibble: 100 x 4
      ID drug   week  weight
   <int> <fct>  <chr>  <dbl>
 1     1 Drug X 0       66.3
 2     1 Drug X 12      70.2
 3     2 Drug X 0       72.3
 4     2 Drug X 12      69.6
 5     3 Drug X 0       78.2

Then you can utilize the tbl_summary with "by=week" inside a tbl_strata function, stratifying on drug, and then adding add_difference() to obtain your "Mean change" column for each drug.

tbl_1 <- df |> 
  select(-ID) |>
  tbl_strata(strata = drug,
      .tbl_fun = ~ tbl_summary(.x, by = week,
                    label=list(weight~"Weight (kg)"),
                    digits=list(everything() ~ 2),
                    statistic = list(all_continuous() ~ "{mean} ({sd})")) |>
        add_difference(estimate_fun = weight~function(x) style_number(x, digits = 2)), 
      .header = "**{strata}**") |>
  modify_header(all_stat_cols() ~ "**{level} weeks**",
                estimate_1 ~"**Mean change**",
                estimate_2 ~"**Mean change**")
tbl_1

Unfortunately, add_difference() calculates group 1 - group 2, when you probably want group 2 - group 1.

To get the "T-test" column that compares the changes over time between the two drugs, again you can use add_difference().

tbl_2 <- mutate(data, weight=wt_0 - wt_12) |>
  select(drug, weight) |>
  tbl_summary(by=drug,
              label=list(weight~"Weight (kg)"),
              digits=list(everything() ~ 2)) |>
  add_difference(estimate_fun=weight~function(x) style_number(x, digits = 2)) |>
  modify_column_hide(c(stat_1, stat_2)) 

tbl_2

And because we ensured that the names and labels of the two calculated variables were the same, we can use tbl_merge to join these two gtsummary objects together:

tbl_merge(list(tbl_1, tbl_2)) |>
  modify_spanning_header(ends_with("1_1")~"**Drug X**",
                         ends_with("2_1")~"**Drug Y**",
                         ends_with("_2")~"**T-Test**")

Data:

set.seed(123) # data created by OP.