r match propensity-score-matching matchit

Matching controls with time-dependent covariates to treated cases with varying treatment time without replacement

I want to estimate the effect of treatment X on variable Y by matching for covariates balance on treatment and control groups using R and the MatchIt package.

I'm compiling a retrospective cohort, and the treatment-time varies across the treatment cases. Moreover, I have multiple covariates (COV_A, COV_B...) that depend on the treatment time. I use a large database to mine controls and query the dependent covariates for a given treatment time. This is a large sample with thousand of treated cases, tens of thousands of potential controls, and many covariates.

To achieve this, I used SQL query to manually perform an "exact match" on some of the covariates as a kind of "initial matching" (for example, checking which controls have been monitored long enough to be treated in a given time). This initial step resulted in a table with multiple rows of potential control cases to match each treated case (TREAD_ID). For each row/case of potential control, I mined the time-depended covariates respecting the treated case treatment time.

The result is a table of potential controls that are stratified for each treatment case. This means that a control case can appear more than once with a different or the same treatment time, and the covariates change accordingly.

My intention is to use the matchit function to perform some kind of distance matching inside a stratum matching using method = "nearest" and exact="TREAT_ID" for example.

Simplified Example Table

CONTROL_ID	TREAT_ID	TREATMENT_TIME	COV_A	COV_B
C-1	T-1	1.5	0.6	185
C-2	T-1	1.5	0.7	123
C-3	T-1	1.5	0.8	182
C-4	T-1	1.5	0.6	185
C-1	T-2	2.2	0.9	160
C-2	T-2	2.2	1.4	150
C5	T-2	2.2	0.9	48
C-6	T-2	2.2	3.3	113

* Notice that controls C-1 and C-2 appears twice...

The Question:

I want to do matching "without replacement" (each control unit is matched to only one treated unit) - How can I achieve this if the initial table contains duplicates of the same control cases (some of which with different values for covariates)?

I also want to be able to:

have control over the order of matching, and begin with the smallest stratum and move ahead...
be able to achieve this also with 1:k matching ratio

(Maybe my whole attitude to the problem is wrong, I'll also be happy to hear different solutions...)

Solution

TL;DR: I used @Noah's suggestion and the unit.id argument.

Full solution

I united the treated cases into the stratified control cases from the example in the question and added the MATCHING_STRATA and MATCHING_CASE columns:

ID	MATCHING_STRATA	MATCHIN_CASE	TREATMENT_TIME	COV_A	COV_B
T-1	T-1	TREATED	1.5	1.2	112
C-1	T-1	CONTROL	1.5	0.6	185
C-2	T-1	CONTROL	1.5	0.7	123
C-3	T-1	CONTROL	1.5	0.8	182
C-4	T-1	CONTROL	1.5	0.6	185
T-2	T-2	TREATED	2.2	1.6	140
C-1	T-2	CONTROL	2.2	0.9	160
C-2	T-2	CONTROL	2.2	1.4	150
C-5	T-2	CONTROL	2.2	0.9	48
C-6	T-2	CONTROL	2.2	3.3	113

And then used the matchit function with exact="MATCHING_STRATA" to look into each stratum individually and unit.id="ID" to declare no replacement all across strata:

MatchIt::matchit(MATCHING_CASE ~ COV_A + COV_B, 
                 data = df, 
                 method = "nearest",
                 exact="MATCHING_STRATA",
                 unit.id="ID",
                 replace = FALSE)

CONTROL_ID	TREAT_ID	TREATMENT_TIME	COV_A	COV_B
C-1	T-1	1.5	0.6	185
C-2	T-1	1.5	0.7	123
C-3	T-1	1.5	0.8	182
C-4	T-1	1.5	0.6	185
C-1	T-2	2.2	0.9	160
C-2	T-2	2.2	1.4	150
C5	T-2	2.2	0.9	48
C-6	T-2	2.2	3.3	113

CONTROL_ID	TREAT_ID	TREATMENT_TIME	COV_A	COV_B
C-1	T-1	1.5	0.6	185
C-2	T-1	1.5	0.7	123
C-3	T-1	1.5	0.8	182
C-4	T-1	1.5	0.6	185
C-1	T-2	2.2	0.9	160
C-2	T-2	2.2	1.4	150
C5	T-2	2.2	0.9	48
C-6	T-2	2.2	3.3	113

CONTROL_ID	TREAT_ID	TREATMENT_TIME	COV_A	COV_B
C-1	T-1	1.5	0.6	185
C-2	T-1	1.5	0.7	123
C-3	T-1	1.5	0.8	182
C-4	T-1	1.5	0.6	185
C-1	T-2	2.2	0.9	160
C-2	T-2	2.2	1.4	150
C5	T-2	2.2	0.9	48
C-6	T-2	2.2	3.3	113