Search code examples
rlubridate

Case-control study "exact" match with overlapping time intervals


I would like to perform case-control match while considering time intervals. If a control observation have the same value for the independent variables X1, X2 and a overlapping time interval X3 with a case, I would like a match.

For instance, suppose the following df1:

row Y   X1   X2              X3
1   0   1   1   2017-01-06 UTC--2017-01-10 UTC
2   0   1   1   2017-01-07 UTC--2017-01-11 UTC
3   0   1   1   2017-01-08 UTC--2017-01-12 UTC
4   0   1   1   2017-01-09 UTC--2017-01-13 UTC
5   0   1   1   2017-01-10 UTC--2017-01-14 UTC
6   1   1   1   2017-01-11 UTC--2017-01-15 UTC
7   0   1   1   2017-01-12 UTC--2017-01-16 UTC
8   0   1   1   2017-01-13 UTC--2017-01-17 UTC
9   0   1   1   2017-01-14 UTC--2017-01-18 UTC
10  0   1   1   2017-01-15 UTC--2017-01-19 UTC
11  0   1   1   2017-01-16 UTC--2017-01-20 UTC

Created with the following code:

library(lubridate)
library(MatchIt)

df1 <- data.frame(Y=c(0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0),     
              X1=rep(1, 11), 
              X2=rep(1,11), 
              X3=c(interval(ymd(20170106), ymd(20170110)),
                   interval(ymd(20170107), ymd(20170111)), 
                   interval(ymd(20170108), ymd(20170112)), 
                   interval(ymd(20170109), ymd(20170113)), 
                   interval(ymd(20170110), ymd(20170114)),
                   interval(ymd(20170111), ymd(20170115)),
                   interval(ymd(20170112), ymd(20170116)),
                   interval(ymd(20170113), ymd(20170117)),
                   interval(ymd(20170114), ymd(20170118)),
                   interval(ymd(20170115), ymd(20170119)),
                   interval(ymd(20170116), ymd(20170120))))

matchit(Y ~ X1 + X2 + X3, data=df1, method="exact")

Output:

summary(matchit(Y ~ X1 + X2 + X3, data=df1, method="exact"))

 Sample sizes:
          Control Treated
All            10       1
Matched        10       1
Unmatched       0       0

match.data(matchit(Y ~ X1 + X2 + X3, data=df1, method="exact"))

row Y   X1   X2              X3              weights   subclass
1   0   1   1   2017-01-06 UTC--2017-01-10 UTC   1   1
2   0   1   1   2017-01-07 UTC--2017-01-11 UTC   1   1
3   0   1   1   2017-01-08 UTC--2017-01-12 UTC   1   1
4   0   1   1   2017-01-09 UTC--2017-01-13 UTC   1   1
5   0   1   1   2017-01-10 UTC--2017-01-14 UTC   1   1
6   1   1   1   2017-01-11 UTC--2017-01-15 UTC   1   1
7   0   1   1   2017-01-12 UTC--2017-01-16 UTC   1   1
8   0   1   1   2017-01-13 UTC--2017-01-17 UTC   1   1
9   0   1   1   2017-01-14 UTC--2017-01-18 UTC   1   1
10  0   1   1   2017-01-15 UTC--2017-01-19 UTC   1   1
11  0   1   1   2017-01-16 UTC--2017-01-20 UTC   1   1

I would like a match between the 6 (case) and 2,3,4,5,7,8,9,10 (controls) this is, if any control time interval falls between de 11th jan 2017 and 15th jan 2017 I want a match with that control

You can appreciate that there is a 1:10 match rather than 1:8 match

EDIT: I change the previous df example: https://pastebin.com/nwzpyUAr

EDIT2: Session Info: https://pastebin.com/g2Q1t1E0


Solution

  • I came to the conclusion that I can match for all variables except for the X3 (time interval), then we can select the cases time_Interval and use the int_overlaps function from lubridate package

    result <- match.data(matchit(Y ~ X1 + X2, data=df1, method="exact"))
    case_timeInterval <- result[result$Y == 1,]$X3
    
    result <- result %>%
      filter(ifelse(int_overlaps(X3, case_timeInterval), 1, 0) == 1)