Search code examples
rstatisticsmatchmatchingpropensity-score-matching

Combining matchit objects for descriptive analysis (CRAN/R)


What I want to do: I am working with MatchIt in R. Because it is so slow and due to computational constraints (I have many thousands of observations), I was forced to split up each of my propensity score matchings by year. I know I can easily extract the resulting matched data and run analyses with them using match.data, but that doesn't serve me when trying to assess how "good" the matches are.

I want to bind two matchit objects together in such a way where I can run plot([binded_matchit], which.xs ~ date + race + age + wages, type = 'density') and similar commands to assess how well the treated and untreated groups match with one another across my entire panel.

Here is a reproducable example:

library(dplyr)
library(MatchIt)

# Generate data for dataframe 1
set.seed(1) # for reproducibility
df1 <- data.frame(
  year = rep(1990, 20),
  ID = 1:20,
  date = sample(seq(as.Date('1990/01/01'), as.Date('1990/12/31'), by="day"), 20),
  race = sample(c("White", "Black", "Asian", "Hispanic"), 20, replace=TRUE),
  age = sample(20:60, 20, replace=TRUE),
  wages = sample(20000:80000, 20),
  treated = rep(c(0,1),10)
)
# Generate data for dataframe 2
set.seed(2) # for reproducibility
df2 <- data.frame(
  year = rep(1991, 20),
  ID = 21:40,
  date = sample(seq(as.Date('1991/01/01'), as.Date('1991/12/31'), by="day"), 20),
  race = sample(c("White", "Black", "Asian", "Hispanic"), 20, replace=TRUE),
  age = sample(20:60, 20, replace=TRUE),
  wages = sample(20000:80000, 20),
  treated = rep(c(1,0),10)
)

df1 <- df1 %>% dplyr::mutate(date = as.numeric(date))
df2 <- df2 %>% dplyr::mutate(date = as.numeric(date))

treats <- matchit(treated ~ date + race + age + wages,
                  method = 'nearest', ratio = 1, data = df1)

treats2 <- matchit(treated ~ date + race + age + wages,
                  method = 'nearest', ratio = 1, data = df2)

I want to bind treats and treats2 so that I can extract density plots, ecdf plots, etc., that determine how well the two matchit objects, together, describe the underlying data. I realize that I am essentially trying to stack n individual matchit objects when each object ran its own analysis, and run inference on them, which may be problematic. But my intuition is that I am simply assessing what the densities look like when all the data is stacked together (i.e., when we have more draws) than when the data is separate. Is there any way to do this?

Any help is appreciated!


Solution

  • This can't be done by combining two matchit objects, but you can either extract the matching weights from each object and append them to your original dataset or combine multiple match.data() outputs where all units are retained. Both approaches should yield the exact same final dataset, except the match.data() will also have the propensity score and subclass/pair membership.

    To request that no units are dropped when using match.data(), set drop.unmatched = FALSE. Then combine all the matched datasets and use the formula method for bal.plot() or other functions in cobalt. Here is how this would look:

    dat <- rbind(match.data(treats, drop.unmatched = FALSE),
                 match.data(treats2, drop.unmatched = FALSE))
    
    bal.tab(treated ~ date + race + age + wages, data = dat,
            weights = "weights", distance = "distance")
    

    Note your example dataset isn't very good for displaying this because no matching actually takes place. But you can, e.g., split the lalonde dataset in two, do the matching in each, and reunite them as above.