Search code examples
rmergeintervals

How to filter rows based on a date range from the result of a specific column in a group


Problem

I have data in four columns (ID), The date a test was done (TestDT) a column with the numerical result of a test of interest (Test1) and another column with a numerical result of a different test of interest (Test2).

ID  TestDT      Test1   Test2
1   2011-03-02  23       NA
2   2011-03-14  NA       16
2   2011-03-15  NA       52
2   2011-11-22  24       NA
2   2011-12-12  NA       77
3   2009-03-02  NA       23
3   2009-05-02  15       NA
3   2011-15-02  NA       66
4   2017-07-03  NA       22

Desired result

I want to get the average result for any of Test2 which happened within a 3 month time period (ie before or after) Test 1 for each ID.

ID  TestDT      Test1   Test2 Av_of_test2_within_range
1   2011-03-02  23       NA    34
2   2011-11-22  24       NA    77
3   2009-05-02  15       NA    23

I am having difficulty trying to filter for Test2 results that happen within the time range

Attempt

I have tried using filter_time from the tibbletime package as follows:

library(tibbletime)
FB <- as_tbl_time(myData, index = TestDT)
FB %>% group_by(ID) %>%filter_time(TestDT ~ TestDT+84)

but get the error :

Error: Problem with `filter()` input `..1`.
x object 'TestDT' not found
i Input `..1` is `{ ... }`.
The error occured in group 1: 

Solution

  • data.table has foverlaps function that merges two datasets on the date ranges.

    You will need to split your data into test1 and test2 and do something like this:

    library(data.table)
    
    df <- read.table(text = "ID  TestDT      Test1   Test2
    1   2011-03-02  23       NA
    2   2011-03-14  NA       16
    2   2011-03-15  NA       52
    2   2011-11-22  24       NA
    2   2011-12-12  NA       77
    3   2009-03-02  NA       23
    3   2009-05-02  15       NA
    3   2011-12-02  NA       66
    4   2017-07-03  NA       22", header = TRUE)
    
    dt <- data.table(df)
    dt[, TestDT := as.Date(TestDT)]
    test1 <- dt[!is.na(Test1), .(ID, TestDT, Test1)]
    test2 <- dt[!is.na(Test2), .(ID, TestDT, Test2)]
    test1[, start.date := TestDT - 91]
    test1[, end.date := TestDT + 91]
    test2[, start.date := TestDT]
    test2[, end.date := TestDT]
    setkey(test2, ID, start.date, end.date)
    
    res <- foverlaps(
      test1, 
      test2, 
      by.x = c("ID", "start.date", "end.date"),
      by.y = c("ID", "start.date", "end.date")
    )