Search code examples
rgtsummary

Adding p-values in tbl summary


I am trying to build on the guidance provided here: [Tests/methods available in add_p() and add_difference()][1] specifically under the section "Custom Functions" by passing a customized function in add_p() function. My goal is to have the p-values from an ANOVA test while incorporating survey design effect. The code am using is

rm(list = ls())

library(gtsummary)
library(dplyr)
library(srvyr)
library(survey)


data <- structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25), 
                       strata = c(10, 20, 30, 10, 20, 20, 10, 20, 30, 30, 10, 30, 30, 20, 10, 20, 20, 20, 10, 20, 20, 30, 30, 20, 30), 
                       weight = c(10, 8, 17, 15, 9, 10, 25, 8, 8, 13, 17, 24, 12, 15, 3, 12, 16, 17, 24, 12, 3, 2, 8, 14, 4), 
                       popgroup = c("A", "B", "A", "C", "A", "C", "B", "B", "B", "A", "A", "C", "A", "B", "A", "C", "B", "A", "C", "B", "A", "B", "C", "B", "B"),
                       gender = c("Male", "Female", "Female", "Male", "Female", "Female", "Male", "Male", "Female", "Male", "Female", "Female", "Male", "Female", "Male", "Female", "Female", "Male", "Female", "Female", "Male", "Female", "Male", "Female", "Male"),
                       inc_01 = c(1500, 1200, 130, 500, 750, 2000, 10000, 1500, 1050, 400, 360, 490, 250, 400, 2500, 1300, 800, 540, 690, 520, 600, 700, 700, 600, 400), 
                       inc_02 = c(360, 450, 120, 300, 900, 560, 450, 280, 720, 360, 1000, 900, 530, 820, 640, 520, 130, 140, 150, 650, 240, 130, 200, 300, 500)), 
                  class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -25L))


dclus2 <- survey::svydesign(
            id = ~ id,
            strata = ~ strata,
            weights = ~ weight,
            data = data
          )


ttest_common_variance <- function(data, variable = inc_01, by = popgroup, ...) {
                                  model<-svyglm(variable ~ by, design=dclus2)
                                  reg_v <- regTermTest(model,~ by)
                                  reg_v1 <- as.numeric(reg_v$p)
                                  reg_v2 <- tibble( p.value = reg_v1 )
}


tbl_2 <-data |>
  as_survey_design(strata = strata, weights = weight) %>%
  gtsummary::tbl_svysummary(
    by = popgroup,
    type = where(is.numeric) ~ "continuous",
    statistic = list(c(inc_01, inc_02) ~ "{mean} {mean.std.error}"),
    missing = "no",
    digits = list(c(inc_01, inc_02) ~ c(4)),
    include = c(inc_01, inc_02)
  ) |>
  gtsummary::add_p(
    test       = c(inc_01, inc_02) ~ "ttest_common_variance",
    pvalue_fun = function(x) gtsummary::style_pvalue(x, digits = 3)
  ) |>
  as_tibble()

but am getting the error

There was an error in 'add_p()/add_difference()' for variable 'inc_01', p-value omitted: Error in svyglm.survey.design(variable ~ by, design = dclus2, family = quasibinomial()): all variables must be in design= argument

Any help will be appreciated. Thanks in advance [1]: https://www.danieldsjoberg.com/gtsummary/reference/tests.html


Solution

  • You have several issues in this code. In the order that errors appear:

    Error in svyglm.survey.design(variable ~ by, design = dclus2): all variables must be in design= argument

    This is quite self-explanatory, any variables that go in your model must be in the svydesign object.

    dclus2 <- survey::svydesign(
      id = ~ id,
      strata = ~ strata,
      weights = ~ weight,
      ## Include variables measured.
      variables = ~ inc_01 + inc_02 + popgroup,
      data = data
    )
    

    All remaining problems are with your ttest_common_variance function:

    1. You're passing in variable and by but these are not existing objects, or rather, you're not actually creating a formula out of them.
    2. You're returning a tibble, but gtsummary expects a scalar. regTermTest returns a 1x1 matrix, so we can just take its first element.

    Here's a version that should work:

    ttest_common_variance <- function(data, variable = "inc_01", by = "popgroup", ...) {
      model <- svyglm(formula(paste0(variable, "~", by)), design=dclus2)
      regTermTest(model, formula(paste0("~", by)))$p[1]
    }
    
    ... pipeline skipped
    
    #> `**Character**` `**A**, N = 101`  `**B**, N = 112`      `**C**, N = 93`   `**p-value**`
    #> <chr>           <chr>             <chr>                 <chr>             <chr>        
    #> inc_01          561.9802 137.7631 2,825.3571 1,669.4621 828.1720 169.7938 0.305        
    #> inc_02          463.3663 130.1409 459.7321 80.4854      463.8710 143.0381 >0.999