I would like to perform case-control match while considering time intervals. If a control observation have the same value for the independent variables X1, X2 and a overlapping time interval X3 with a case, I would like a match.
For instance, suppose the following df1:
row Y X1 X2 X3
1 0 1 1 2017-01-06 UTC--2017-01-10 UTC
2 0 1 1 2017-01-07 UTC--2017-01-11 UTC
3 0 1 1 2017-01-08 UTC--2017-01-12 UTC
4 0 1 1 2017-01-09 UTC--2017-01-13 UTC
5 0 1 1 2017-01-10 UTC--2017-01-14 UTC
6 1 1 1 2017-01-11 UTC--2017-01-15 UTC
7 0 1 1 2017-01-12 UTC--2017-01-16 UTC
8 0 1 1 2017-01-13 UTC--2017-01-17 UTC
9 0 1 1 2017-01-14 UTC--2017-01-18 UTC
10 0 1 1 2017-01-15 UTC--2017-01-19 UTC
11 0 1 1 2017-01-16 UTC--2017-01-20 UTC
Created with the following code:
library(lubridate)
library(MatchIt)
df1 <- data.frame(Y=c(0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0),
X1=rep(1, 11),
X2=rep(1,11),
X3=c(interval(ymd(20170106), ymd(20170110)),
interval(ymd(20170107), ymd(20170111)),
interval(ymd(20170108), ymd(20170112)),
interval(ymd(20170109), ymd(20170113)),
interval(ymd(20170110), ymd(20170114)),
interval(ymd(20170111), ymd(20170115)),
interval(ymd(20170112), ymd(20170116)),
interval(ymd(20170113), ymd(20170117)),
interval(ymd(20170114), ymd(20170118)),
interval(ymd(20170115), ymd(20170119)),
interval(ymd(20170116), ymd(20170120))))
matchit(Y ~ X1 + X2 + X3, data=df1, method="exact")
Output:
summary(matchit(Y ~ X1 + X2 + X3, data=df1, method="exact"))
Sample sizes:
Control Treated
All 10 1
Matched 10 1
Unmatched 0 0
match.data(matchit(Y ~ X1 + X2 + X3, data=df1, method="exact"))
row Y X1 X2 X3 weights subclass
1 0 1 1 2017-01-06 UTC--2017-01-10 UTC 1 1
2 0 1 1 2017-01-07 UTC--2017-01-11 UTC 1 1
3 0 1 1 2017-01-08 UTC--2017-01-12 UTC 1 1
4 0 1 1 2017-01-09 UTC--2017-01-13 UTC 1 1
5 0 1 1 2017-01-10 UTC--2017-01-14 UTC 1 1
6 1 1 1 2017-01-11 UTC--2017-01-15 UTC 1 1
7 0 1 1 2017-01-12 UTC--2017-01-16 UTC 1 1
8 0 1 1 2017-01-13 UTC--2017-01-17 UTC 1 1
9 0 1 1 2017-01-14 UTC--2017-01-18 UTC 1 1
10 0 1 1 2017-01-15 UTC--2017-01-19 UTC 1 1
11 0 1 1 2017-01-16 UTC--2017-01-20 UTC 1 1
I would like a match between the 6 (case) and 2,3,4,5,7,8,9,10 (controls) this is, if any control time interval falls between de 11th jan 2017 and 15th jan 2017 I want a match with that control
You can appreciate that there is a 1:10 match rather than 1:8 match
EDIT: I change the previous df example: https://pastebin.com/nwzpyUAr
EDIT2: Session Info: https://pastebin.com/g2Q1t1E0
I came to the conclusion that I can match for all variables except for the X3 (time interval), then we can select the cases time_Interval and use the int_overlaps function from lubridate package
result <- match.data(matchit(Y ~ X1 + X2, data=df1, method="exact"))
case_timeInterval <- result[result$Y == 1,]$X3
result <- result %>%
filter(ifelse(int_overlaps(X3, case_timeInterval), 1, 0) == 1)