Search code examples
rsurveyr-micemarginal-effects

Pooled average marginal effects from survey-weighted and multiple-imputed data


I am working with survey data and their associated weights, in addition to missing data that I imputed using mice(). The model I'm eventually running contains complex interactions between variables for which I want the average marginal effect.

This task seems trivial in STATA, but I'd rather stay in R since that's what I know best. It seems easy to retrieve AME's for each separate imputed dataset and average the estimates. However, I need to make use of pool() (from mice) to make sure I'm getting the correct standard errors.

Here is a reproducible example:

library(tidyverse)
library(survey)
library(mice)
library(margins)

df <- tibble(y = c(0, 5, 0, 4, 0, 1, 2, 3, 1, 12), region = c(1, 1, 1, 1, 1, 3, 3, 3, 3, 3), 
             weight = c(7213, 2142, 1331, 4342, 9843, 1231, 1235, 2131, 7548, 2348), 
             x1 = c(1.14, 2.42, -0.34, 0.12, -0.9, -1.2, 0.67, 1.24, 0.25, -0.3),
             x2 = c(12, NA, 10, NA, NA, 12, 11, 8, 9, 9))

Using margins() on a simple (non-multiple) svyglm works without a hitch. Running svyglm on each imputation using which() and pooling the results also works well.

m <- with(surv_obj, svyglm(y ~ x1 * x2))
pool(m)

However, wrapping margins() into which() returns an error "Error in .svycheck(design) : argument "design" is missing, with no default"

with(surv_obj, margins(svyglm(y ~ x1 * x2), design = surv_obj))

If I specify the design in the svyglm call, I get "Error in UseMethod("svyglm", design) : no applicable method for 'svyglm' applied to an object of class "svyimputationList""

with(surv_obj, margins(svyglm(y ~ x1 * x2, design = surv_obj), design = surv_obj))

If I drop the survey layer, and simply try to run the margins on each imputed set and then pool, I get a warning: "Warning in get.dfcom(object, dfcom) : Infinite sample size assumed.".

m1 <- with(imputed_df, margins(lm(y ~ x1 * x2)))
pool(m1)

This worries me given that pool() may use sample size in its calculations.

Does anyone know of any method to either (a) use which(), margins() and pool() to retrieve the pooled average marginal effects or (b) knows what elements of margins() I should pass to pool() (or pool.scalar()) to achieve the desired result?


Solution

  • Update following Vincent's comment

    Wanted to update this post following Vincent's comment and related package marginaleffects() which ended up fixing my issue. Hopefully, this will be helpful to others stuck on similar problems.

    I implemented the code in the vignette linked in Vincent's comment, adding a few steps that allow for survey weighting and modeling. It's worth noting that svydesign() will drop any observations missing on clustering/weighting variables, so marginaleffects() can't predict values back unto the original "dat" data and will throw up an error. Pooling my actual data still throws up an "infinite sample size assumed", which (as noted) should be fine but I'm still looking into fixes.

    library(tidyverse)
    library(survey)
    library(mice)
    library(marginaleffects)
    
    fit_reg <- function(dat) {
      
        svy <- svydesign(ids = ~ 1, cluster = ~ region, weight = ~weight, data = dat)
        mod <- svyglm(y ~ x1 + x2*factor(x3), design = svy)
        out <- marginaleffects(mod, newdata = dat)
        
        class(out) <- c("custom", class(out))
        return(out)
    }
    
    tidy.custom <- function(x, ...) {
        out <- marginaleffects:::tidy.marginaleffects(x, ...)
        out$term <- paste(out$term, out$contrast)
        return(out)
    }
    
    df <- tibble(y = c(0, 5, 0, 4, 0, 1, 2, 3, 1, 12), region = c(1, 1, 1, 1, 1, 3, 3, 3, 3, 3), 
                 weight = c(7213, 2142, 1331, 4342, 9843, 1231, 1235, 2131, 7548, 2348), 
                 x1 = c(1.14, 2.42, -0.34, 0.12, -0.9, -1.2, 0.67, 1.24, 0.25, -0.3),
                 x2 = c(12, NA, 10, NA, NA, 12, 11, 8, 9, 9),
                 x3 = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2))
    
    imputed_df <- mice(df, m = 2, seed = 123)
    
    dat_mice <- complete(imputed_df, "all")
    mod_imputation <- lapply(dat_mice, fit_reg)
    mod_imputation <- pool(mod_imputation)
    
    summary(mod_imputation)