Search code examples
rlinear-regressiongtsummary

Calculating global p-value for a linear regression model on a multiple imputed datasets using the tbl_regressioin


I am trying to calculate the global p-value using the gtsummary package on my multiple imputed datasets. I haven't found a solution for this. I know that gtsummary allows for the production of linear regression on multiple imputed datasets, but I don't think the add_global_p() has been set up for this. I know that to calculate anova for MI dataset requires the use of mi.anova from the miceadd package and that gtsummary uses the car::anova() function. Does anyone have a solution for this?

# loads relevant packages using the pacman package
pacman::p_load(
  tidyverse,   # data management and visualization
  mice,        # for multiple imputation
  gtsummary)   # for tables

# generate a samall sample of the boys dataset for MI
boys_miss <- sample(head(boys,100))

# impute a sample of the boys dataset 
boys_imp <- parlmice(boys_miss,
                        m = 5,
                        maxit = 5,
                        cluster.seed = 1234)

# run linear regression on the imputed dataset
boys_imp %>% 
  with(.,
       lm(wgt ~ reg)
  ) %>% 
  tbl_regression() %>% 
  add_global_p() # when I add this function, I get the below error


x `add_global_p()` uses `car::Anova()` to calculate the global p-value,
and the function returned an error while calculating the p-values.
Is your model type supported by `car::Anova()`?
  Error in UseMethod("vcov") : 
  no applicable method for 'vcov' applied to an object of class "c('mira', 'matrix')"

I would like the table to look something like this..

enter image description here


Solution

  • you'll need to calculate the p-value, and add it to the gtsummary table using modify_table_body(). Example below!

    # loads relevant packages using the pacman package
    pacman::p_load(
      tidyverse,   # data management and visualization
      mice,        # for multiple imputation
      gtsummary)   # for tables
    
    # generate a samall sample of the boys dataset for MI
    boys_miss <- sample(head(boys,100))
    
    # impute a sample of the boys dataset 
    boys_imp <- parlmice(boys_miss,
                         m = 5,
                         maxit = 5,
                         cluster.seed = 1234)
    
    
    tbl <- 
      # build linear regression on the imputed dataset
      boys_imp %>% 
      with(lm(wgt ~ reg)) %>% 
      tbl_regression() %>%
      # replace individual p-values with global p-value
      modify_table_body(
        ~ .x %>% 
          select(-p.value) %>%
          full_join(
            miceadds::mi.anova(boys_imp,  formula="wgt ~ reg", type=2) %>%
              as.data.frame() %>%
              tibble::rownames_to_column(var = "variable") %>%
              filter(variable != "Residual") %>%
              mutate(row_type = "label",
                     variable = str_trim(variable)) %>%
              select(variable, row_type, p.value = anova.table.Pr..F.),
            by = c("variable", "row_type")
          )
      )
    #> pool_and_tidy_mice(): Tidying mice model with
    #> `mice::pool(x) %>% mice::tidy(exponentiate = FALSE, conf.int = TRUE, conf.level = 0.95)`
    #> Univariate ANOVA for Multiply Imputed Data (Type 2)  
    #> 
    #> lm Formula:  wgt ~ reg
    #> R^2=0.0619 
    #> ..........................................................................
    #> ANOVA Table 
    #>                   SSQ df1      df2 F value  Pr(>F)    eta2 partial.eta2
    #> reg          10.65048   4 3461.232  1.4994 0.19958 0.06191      0.06191
    #> Residual    161.39186  NA       NA      NA      NA      NA           NA
    

    enter image description here Created on 2021-08-05 by the reprex package (v2.0.0)