Search code examples
rgtsummary

Report p-value from lme model (random intercept) with gtsummary using tbl_summary() and add_p()


I am working in R using gtsummary, trying to report the p-value from the random intercept model

lme(Variable1 ~ Study_Group, random = ~1|ID, na.action = na.omit, data = study_dat) 

in a summary/comparison table using

lme_stat_function <- function(data, variable, by, group, ...){
  nlme::lme(variable ~ by, random = ~1|group, na.action = na.omit, data = data) %>% 
    broom.mixed::tidy()    
}

lme_stat_render <- list(
  all_continuous() ~ "lme_stat_function"
)

study_dat %>% 
  select(ID, Study_Group, Variable1) %>% 
  tbl_summary(
    by = Study_Group,
    type = list(
      where(is.numeric) ~ "continuous2"
    )
  ) %>%
  bold_labels() %>%
  italicize_levels() %>%
  add_p(
    test = lme_stat_render 
  ) 

I have been wrestling with this seemingly simple issue for days and I cannot seem to get anything other than an error that effectively reads For variable Variable1 (Study_Group) and "p.value" statistic: variable lengths differ (found for 'group').

I've been reading through the documentation and can only find pseudo-code for similar scenarios, but not exactly this one. The mixed model above does work, but the function to extract that p-value and report it in the table for the respective variable does not. Here is a look at the general structure of my data:

structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 84, 84, 
85, 86, 87, 87, 88, 88, 89, 89, 90, 90, 91, 92, 93, 93, 94, 95, 
95, 96, 97, 97), Variable1 = c(6, 29, 16, 15, 19, 15, 
4, 8, 26, 9, 7, 27, 3, 31, 16, 19, 8, 12, 5, 7, 7, 12, 9, 54, 
5, 14, 4, 3, 11, 17, 7), Study_Group = structure(c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L), levels = c("Group A", 
"Group B"), class = "factor")), row.names = c(NA, -31L
), class = c("tbl_df", "tbl", "data.frame"))

I recognize that the repeated measurements here are only repeated for one group, but I'm not debating statistical methodology here. I'm just trying to get this to work as it is. Any pointers?


Solution

  • You're on the right track. It looks like the issue may be in the construction of the model. In the example below, I've used reformulate() which is a great helper for creating formulas from string input.

    library(gtsummary)
    
    study_dat <-
      structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 84, 84, 
                            85, 86, 87, 87, 88, 88, 89, 89, 90, 90, 91, 92, 93, 93, 94, 95, 
                            95, 96, 97, 97), 
                     Variable1 = c(6, 29, 16, 15, 19, 15, 
                                   4, 8, 26, 9, 7, 27, 3, 31, 16, 19, 8, 12, 5, 7, 7, 12, 9, 54, 
                                   5, 14, 4, 3, 11, 17, 7), 
                     Study_Group = structure(c(1L, 1L, 
                                               1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
                                               2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L), levels = c("Group A", 
                                                                                                               "Group B"), class = "factor")), row.names = c(NA, -31L
                                                                                                               ), 
                class = c("tbl_df", "tbl", "data.frame"))
    
    
    # you can construct a function that returns a single 
    # row data frame with the p-value in it
    # details for writing this function: https://www.danieldsjoberg.com/gtsummary/reference/tests.html#custom-functions
    my_lme_test <- function(data, by, variable, group, ...) {
      nlme::lme(
        data = data, 
        fixed = reformulate(response = variable, termlabels = by),
        random = reformulate(glue::glue("1 | {group}"))
      ) |> 
        broom.mixed::tidy() |> 
        dplyr::filter(startsWith(term, by)) # keep the row associated with treatment
    }
    # test the function
    my_lme_test(study_dat, by = 'Study_Group', variable = "Variable1", group = "ID")
    #> # A tibble: 1 × 8
    #>   effect group term               estimate std.error    df statistic p.value
    #>   <chr>  <chr> <chr>                 <dbl>     <dbl> <dbl>     <dbl>   <dbl>
    #> 1 fixed  <NA>  Study_GroupGroup B    -2.48      3.94    21    -0.631   0.535
    
    
    study_dat |> 
      select(ID, Study_Group, Variable1) |> 
      tbl_summary(
        by = Study_Group,
        include = -ID,
        type = where(is.numeric) ~ "continuous2"
      ) |> 
      add_p(test = everything() ~ my_lme_test, group = ID) |> 
      as_kable() # convert to kable to display on SO
    
    Characteristic Group A N = 14 Group B N = 17 p-value
    Variable1 0.5
    Median (Q1, Q3) 15 (9, 19) 7 (5, 16)

    Created on 2025-01-30 with reprex v2.1.1