Search code examples
gtsummarypropensity-score-matchingsmd

How to calculate SMD between 3 groups or more?


I'm interested in calculating pairwise standardized mean differences(SMD) by one stratifying variable. Usually this is calculated between two groups, but can we make this calculation in 3 groups or more?

P.S. I'm a big fan of gtsummary package, so I attempted to do this analysis using example 2 from this amazing package as follows:

library(tidyverse)
library(gtsummary)
#> #BlackLivesMatter
add_difference_ex2 <-
  trial %>%
  mutate(trt=ifelse(age<40,"Drug C", trt)) %>% 
  select(trt, age, marker, grade, stage) %>%
  tbl_summary(
    by = trt,
    statistic = list(all_continuous() ~ "{mean} ({sd})"),
    missing = "no",
    include = c(age, marker, trt)
  ) %>%
  add_n() %>%
  add_difference(adj.vars = c(grade, stage))
#> 11 observations missing `trt` have been removed. To include these observations, use `forcats::fct_explicit_na()` on `trt` column before passing to `tbl_summary()`.
#> Error: 'tbl_summary'/'tbl_svysummary' object must have a `by=` value with exactly two levels

Created on 2021-10-27 by the reprex package (v2.0.1)


Solution

  • To add the pairwise standardized mean differences (SMD), you first need to define a function that will calculate and return the pairwise SMD estimates. Once you've done that, you can add it to the gtsummary table using the generic function add_stat(). Example Below!

    library(gtsummary)
    library(tidyverse)
    
    # function to calculate pairwise smd
    pairwise_smd <- function(data, variable, by, ...) {
      data <- 
        dplyr::select(data, all_of(c(variable, by))) %>%
        rlang::set_names(c("variable", "by")) %>%
        dplyr::filter(complete.cases(.)) %>%
        arrange(desc(.data$by))
      
      tibble(exclude = unique(data$by)) %>%
        mutate(
          include = map_chr(.data$exclude, ~unique(data$by) %>% setdiff(.x) %>% paste(collapse = " vs. ")),
          data_subset = 
            map(
              .data$exclude, 
              ~data %>%
                filter(!.data$by  %in% .x) %>%
                mutate(by = factor(.data$by))
            ),
          smd = map_dbl(.data$data_subset, ~smd::smd(.x$variable, .x$by)$estimate)
        ) %>%
        select(include, smd) %>%
        spread(include, smd)
    }
    
    tbl <-
      trial %>%
      select(age, grade, stage) %>%
      tbl_summary(
        by = grade,
        statistic = list(all_continuous() ~ "{mean} ({sd})"),
        missing = "no"
      ) %>%
      add_stat(fns = everything() ~ pairwise_smd)
    

    enter image description here Created on 2021-10-27 by the reprex package (v2.0.1)