Search code examples
rmatchitweightit

Incorporating survey weights with weightthem in R


I am attempting to incorporate survey/sampling weights while using CBPS and WeightThem after multiple imputation. The basic code structure of my approach is:

# MICE
var_list <- c("outcome", "x1", "x2", "x3", "x4")
tempData_pool <- mice(d2, m = 10, maxit = 20, seed = 100, include = var_list)

# Weights
weighted_pool_data <- weightthem(x1 ~ x2 + x3 + x4), 
                  data = tempData_pool, 
                  method = "cbps", 
                  estimand = "ATT") 

# Model
weighted_pool_model <- with(weighted_pool_data,
                        estimatr::lm_robust(outcome ~ x1 + x2 + x3 + x4))
weighted_pool_results <- pool(weighted_pool_model)

After looking at the CBPS documentation (https://cran.r-project.org/web/packages/CBPS/CBPS.pdf), I've attempted to incorporate survey weights into the regression model like this:

weighted_pool_model <- with(weighted_pool_data,
                        estimatr::lm_robust(outcome ~ x1 + x2 + x3 + x4), sample.weight = wgtvar)

weighted_pool_results <- pool(weighted_pool_model)

And while that doesn't cause an error, none of the coefficients or SEs seem to change at all, which makes me think it's not actually using the survey weights.

My guess is that this is because the survey weight variable (wgtvar) is not included in the imputation model, so perhaps it's not carrying through from the original dataframe (d2) into the imputed data object that gets used with weightthem()?

EDIT: The below would work with survey::svyglm replacing lm_robust but because I'm using imputation, the data is a wimids object instead of a dataframe. So it doesn't cooperate there as far as I can tell.

design <- survey::svydesign(
  ids = ~1, 
  weights = ~wgtvar,
  data = weighted_pool_data)

Solution

  • WeightIt v1.0.0 has the function glm_weightit(), which when supplied with a weightit object that contains survey weights, automatically incorporates them into estimation of the outcome model parameters and correctly provides a variance matrix that adjusts for them. It is possible to use glm_weightit() with the updated version of MatchThem (version 1.2.1 and greater).

    First, you should make sure your survey weights are included in the estimation of the weights, i.e., by using the s.weights argument. Then you should just be able to use glm_weightit() to fit the outcome model in each imputed dataset.

    library("mice")
    weighted_pool_data <- weightthem(x1 ~ x2 + x3 + x4), 
                                     data = tempData_pool, 
                                     method = "cbps", 
                                     estimand = "ATT",
                                     s.weights = "wgtvar")
    
    cobalt::bal.tab(weighted_pool_data, abs = TRUE)
    
    fit <- with(weighted_pool_data,
                WeightIt::glm_weightit(outcome ~ x1 + x2 + x3 + x4))
    
    fit |> pool() |> summary()
    

    G-computation is a bit harder because you need to include the survey weights in the standardization and subset your data to target the ATT. Right now, this has to be done manually:

    library("marginaleffects")
    
    lapply(1:tempData_pool$m, function(i) {
      avg_comparisons(fit$analyses[[i]], variables = "x1",
                      newdata = subset(complete(tempData_pool, i), x1 == 1),
                      wts = "wgtvar")
    }) |> mice::pool() |> summary()
    

    (In an upcoming version of marginaleffects, you should be able to omit the complete(tempData_pool, i) from the subset() call.)