Search code examples
rregressionpropensity-score-matching

Does `MatchIt::matchit()` not support exlusion of variables in formula?


My goal is to exclude net_tfa from the formula. Nervertheless, I would like to use this variable in subsequent steps. Why does matchit() not except variable exclusion by -variable? How can I overcome this?

A comparison with lm():

# toy data
# install.packages("hdm")
library(hdm); data("pension")

# allowing convergence
pension <- pension[, c("p401", "net_tfa", "age", "db", 
                       "educ","fsize", "hown", "inc", 
                       "male", "marr","pira", "twoearn")]

lm(p401 ~ . -net_tfa,
   data = pension)
#> 
#> Call:
#> lm(formula = p401 ~ . - net_tfa, data = pension)
#> 
#> Coefficients:
#> (Intercept)          age           db         educ        fsize         hown  
#>   9.865e-02   -1.718e-03    1.174e-01    3.975e-04   -3.112e-03    6.556e-02  
#>         inc         male         marr         pira      twoearn  
#>   3.967e-06   -1.347e-02   -1.027e-02    6.407e-02    2.519e-02

# install.packages("MatchIt")
library(MatchIt)
matchit(formula = p401 ~ . -net_tfa,
        data = pension,
        method = "nearest",
        link = "probit",
        estimand = "ATT",
        replace = TRUE)
#> A matchit object
#>  - method: 1:1 nearest neighbor matching with replacement
#>  - distance: Propensity score
#>              - estimated with probit regression
#>  - number of obs.: 9915 (original), 4435 (matched)
#>  - target estimand: ATT
#>  - covariates: net_tfa, age, db, educ, fsize, hown, inc, male, marr, pira, twoearn

Created on 2022-12-29 with reprex v2.0.2

As you can see, net_tfa is covariate in the latter case.


In case I missed tags, please edit.


Solution

  • If you don't include the variable in the main formula, it will still appear in the matched dataset produced by match.data(), which is just a copy of your original dataset with unmatched observations dropped and additional columns added. It won't appear in the summary() output, but you can always request it using the addlvariables argument. So just include the variables you want to match on in the main formula.

    If you are using the formula to estimate a propensity score, then using -var does exclude var from the propensity score model since the formula is passed directly to the model fitting function (by default, glm()). But the variable will still show up in the balance output because that relies on model.frame(), which includes all variables in the formula even if removed using -. The print() method for matchit objects prints the variables that will appear in the balance summary.