My goal is to exclude net_tfa
from the formula. Nervertheless, I would like to use this variable in subsequent steps. Why does matchit()
not except variable exclusion by -variable
? How can I overcome this?
A comparison with lm()
:
# toy data
# install.packages("hdm")
library(hdm); data("pension")
# allowing convergence
pension <- pension[, c("p401", "net_tfa", "age", "db",
"educ","fsize", "hown", "inc",
"male", "marr","pira", "twoearn")]
lm(p401 ~ . -net_tfa,
data = pension)
#>
#> Call:
#> lm(formula = p401 ~ . - net_tfa, data = pension)
#>
#> Coefficients:
#> (Intercept) age db educ fsize hown
#> 9.865e-02 -1.718e-03 1.174e-01 3.975e-04 -3.112e-03 6.556e-02
#> inc male marr pira twoearn
#> 3.967e-06 -1.347e-02 -1.027e-02 6.407e-02 2.519e-02
# install.packages("MatchIt")
library(MatchIt)
matchit(formula = p401 ~ . -net_tfa,
data = pension,
method = "nearest",
link = "probit",
estimand = "ATT",
replace = TRUE)
#> A matchit object
#> - method: 1:1 nearest neighbor matching with replacement
#> - distance: Propensity score
#> - estimated with probit regression
#> - number of obs.: 9915 (original), 4435 (matched)
#> - target estimand: ATT
#> - covariates: net_tfa, age, db, educ, fsize, hown, inc, male, marr, pira, twoearn
Created on 2022-12-29 with reprex v2.0.2
As you can see, net_tfa
is covariate in the latter case.
In case I missed tags, please edit.
If you don't include the variable in the main formula, it will still appear in the matched dataset produced by match.data()
, which is just a copy of your original dataset with unmatched observations dropped and additional columns added. It won't appear in the summary()
output, but you can always request it using the addlvariables
argument. So just include the variables you want to match on in the main formula.
If you are using the formula to estimate a propensity score, then using -var
does exclude var from the propensity score model since the formula is passed directly to the model fitting function (by default, glm()
). But the variable will still show up in the balance output because that relies on model.frame()
, which includes all variables in the formula even if removed using -
. The print()
method for matchit
objects prints the variables that will appear in the balance summary.