Search code examples

Matching controls and cases using some conditions

I want to match up to 3 controls for each of the case with three conditions:

  1. The age should be +3 years
  2. The gender should be the same
  3. The pracid should be the same

Example data:

#> dat
# A tibble: 14 x 4
   patid   age gender status  pracid     eventdate
   <dbl> <dbl>  <chr> <chr>  <dbl>    <date> 
 1     1    10     M case        100     23-05-20
 2     2    20     F case        200     12-01-20 
 3     3    44     M case        300     21-02-20
 4     4    11     F case        100     14-01-20
 5   111    12     M control     100     NA
 6   222    11     M control     100     NA
 7   333     8     M control     100     NA
 8   444    12     F control     200     NA
 9   555    11     M control     100     NA
10   666    22     F control     100     NA
11   777    21     F control     100     NA
12   888    18     M control     200     NA
13   999    21     M control     200     NA 
14  1000    18     M control     100     NA

Expected outcome: For id = 1, the matched controls as below, and I just need select 3 controls randomly in the table below.

patid   age Gender group pracid   
111     12  M   control   100
222     11  M   control   100
333     8   M   control   100
555     11  M   control   100

I do not want two cases to share the same control. Every case needs to have unique controls(unique patid). I would like the final output to also tell me for each control which case it was matched (in the example below they were matched to patid 1) and I want the event date of the case to be copied into the controls too. to. E.g.

patid   age gender  group     pracid  matched_id match_eventdate
1       10      M   case      100     1              23-05-20
111     12      M   control   100     1              23-05-20
222     11      M   control   100     1              23-05-20
333     8       M   control   100     1              23-05-20
555     11      M   control   100     1              23-05-20

I need the event date to be copied because I have other parts of the dataset where I need to check how many diseases were cases and controls diagnosed with after that event date (basically the event date is the index date for cases and controls).


  • This is straightforward using MatchIt. below is the code you would use to performing the matching:

    m.out <- matchit(I(status == "case") ~ age, data = data,
                     exact = ~pracid + gender,
                     caliper = c(age = 3), std.caliper = FALSE,
                     distance = "euclidean", ratio = 3)

    This does 3:1 nearest neighbor matching on age, ensuring that patients are exactly matched on pracid and gender and that all controls are within 3 years of age of their matched case.

    Next we extract the matched dataset using <-, subclass = "matched_id")

    Finally, we will re-order the dataset and fill in the missing event dates: <-[with(, order(matched_id, status, patid)),]$match_eventdate <-$eventdate
    for (i in levels($matched_id)) {
      in_i <- which($matched_id == i)$match_eventdate[in_i] <- na.omit($eventdate[in_i])

    You can examine the matched sets either by printing the object, which will look close to what you specified above, or by examining m.out$match.matrix, which identifies which controls are matched to each case.

    Note that if any case does not receive any controls, it will be dropped from the matched dataset. If it receives 1 or 2 controls, it will remain in the dataset, but the matched controls will have weights associated with them that you must include when estimating the effect. If you don't want any cases that have fewer than 3 controls, there is no way to remove them in matchit(), but you can drop them from the dataset using the following:

    subclass_3 <- levels($matched_id)[table($matched_id) == 3] <-[$matched_id %in% subclass_3,]