I am working in R using gtsummary, trying to report the p-value from the random intercept model
lme(Variable1 ~ Study_Group, random = ~1|ID, na.action = na.omit, data = study_dat)
in a summary/comparison table using
lme_stat_function <- function(data, variable, by, group, ...){
nlme::lme(variable ~ by, random = ~1|group, na.action = na.omit, data = data) %>%
broom.mixed::tidy()
}
lme_stat_render <- list(
all_continuous() ~ "lme_stat_function"
)
study_dat %>%
select(ID, Study_Group, Variable1) %>%
tbl_summary(
by = Study_Group,
type = list(
where(is.numeric) ~ "continuous2"
)
) %>%
bold_labels() %>%
italicize_levels() %>%
add_p(
test = lme_stat_render
)
I have been wrestling with this seemingly simple issue for days and I cannot seem to get anything other than an error that effectively reads For variable Variable1 (Study_Group) and "p.value" statistic: variable lengths differ (found for 'group')
.
I've been reading through the documentation and can only find pseudo-code for similar scenarios, but not exactly this one. The mixed model above does work, but the function to extract that p-value and report it in the table for the respective variable does not. Here is a look at the general structure of my data:
structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 84, 84,
85, 86, 87, 87, 88, 88, 89, 89, 90, 90, 91, 92, 93, 93, 94, 95,
95, 96, 97, 97), Variable1 = c(6, 29, 16, 15, 19, 15,
4, 8, 26, 9, 7, 27, 3, 31, 16, 19, 8, 12, 5, 7, 7, 12, 9, 54,
5, 14, 4, 3, 11, 17, 7), Study_Group = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L), levels = c("Group A",
"Group B"), class = "factor")), row.names = c(NA, -31L
), class = c("tbl_df", "tbl", "data.frame"))
I recognize that the repeated measurements here are only repeated for one group, but I'm not debating statistical methodology here. I'm just trying to get this to work as it is. Any pointers?
You're on the right track. It looks like the issue may be in the construction of the model. In the example below, I've used reformulate()
which is a great helper for creating formulas from string input.
library(gtsummary)
study_dat <-
structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 84, 84,
85, 86, 87, 87, 88, 88, 89, 89, 90, 90, 91, 92, 93, 93, 94, 95,
95, 96, 97, 97),
Variable1 = c(6, 29, 16, 15, 19, 15,
4, 8, 26, 9, 7, 27, 3, 31, 16, 19, 8, 12, 5, 7, 7, 12, 9, 54,
5, 14, 4, 3, 11, 17, 7),
Study_Group = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L), levels = c("Group A",
"Group B"), class = "factor")), row.names = c(NA, -31L
),
class = c("tbl_df", "tbl", "data.frame"))
# you can construct a function that returns a single
# row data frame with the p-value in it
# details for writing this function: https://www.danieldsjoberg.com/gtsummary/reference/tests.html#custom-functions
my_lme_test <- function(data, by, variable, group, ...) {
nlme::lme(
data = data,
fixed = reformulate(response = variable, termlabels = by),
random = reformulate(glue::glue("1 | {group}"))
) |>
broom.mixed::tidy() |>
dplyr::filter(startsWith(term, by)) # keep the row associated with treatment
}
# test the function
my_lme_test(study_dat, by = 'Study_Group', variable = "Variable1", group = "ID")
#> # A tibble: 1 × 8
#> effect group term estimate std.error df statistic p.value
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 fixed <NA> Study_GroupGroup B -2.48 3.94 21 -0.631 0.535
study_dat |>
select(ID, Study_Group, Variable1) |>
tbl_summary(
by = Study_Group,
include = -ID,
type = where(is.numeric) ~ "continuous2"
) |>
add_p(test = everything() ~ my_lme_test, group = ID) |>
as_kable() # convert to kable to display on SO
Characteristic | Group A N = 14 | Group B N = 17 | p-value |
---|---|---|---|
Variable1 | 0.5 | ||
Median (Q1, Q3) | 15 (9, 19) | 7 (5, 16) |
Created on 2025-01-30 with reprex v2.1.1