Search code examples
rimputationr-micepropensity-score-matching

Propensity score matching with multiple imputation


I have a dataset with a couple of missing values and would need to run a propensity score matching using the variable 'y' as Treatment variable and x1, x2 and x3 as variables for adjustment. By using the following code with Matchit

ModMatch <- matchit(y ~ x1+x2+x3, method = 'nearest', data = data)

I obtain the error 'Missing values exist in the data'

I have therefore tried to run a multiple imputation using mice:

ImputedDF <- mice(data)
ModMatch <- matchit(y ~ x1+x2+x3, method = 'nearest', data = ImputedDF)

And I get the error 'cannot coerce an object of class mids to a dataframe'. I would probably need a way to print an imputed data frame, could anyone know if that is possible?


Solution

  • You should use the MatchThem package, which was specifically designed for performing matching after multiple imputation. The matchthem() function calls matchit() and performs matching within each imputed dataset. You can then check balance in the imputed dataset using the cobalt package, which was designed to be compatible with MatchThem. Afterward, you can use the with() function in MatchThem to estimate the effect. Here's an example of this workflow:

    library(mice); library(MatchThem); library(cobalt)
    
    #Impute the data with 20 imputations (more is better)
    imp <- mice(data, m = 20)
    
    #Perform matching within each imputation
    ModMatch <- matchthem(y ~ x1 + x2 + x3, method = 'nearest', data = imp)
    
    #Assess balance
    bal.tab(ModMatch, un = TRUE)
    love.plot(Modmatch)
    
    #Estimate the effect
    summary(pool(with(ModMatch, svyglm(outcome ~ y + x1 + X2 + X2))))
    

    I would caution you that you are using advanced statistical techniques that should not be used by someone without advanced training. Using the defaults in mice and MatchThem is rarely a good idea.

    Regarding the error messages you were getting: the output of a call to mice() is not a data frame; it's a mids object. The data argument in matchit() requires a data frame. matchthem() accepts a mids object to perform the matching within each imputed dataset.