Search code examples
rdateoverlap

Dateoverlap in data.frame in R


I'm trying to make a new variable in my dataframe based on date overlapping.

My data looks like this

         People   Company  Start      End
1        Person1  CompanyA 2017-04-27 2019-12-31
2        Person2  CompanyB 2017-04-27 2019-08-30
3        Person3  CompanyA 2013-04-27 2019-12-31
4        Person4  CompanyB 2017-04-27 2019-12-31
5        Person5  CompanyC 2016-05-13 2019-12-31

I would like to check if each person was in the company at the following dates

    2016-12-31
    2017-12-31
    2018-12-31
    2019-12-31
...

I'm using the %overlaps% from the DescTools library. And if I check just one row in my data

c(data$Start[1], data$End[1]) %overlaps% dates[1]

it returns TRUE (or FALSE in other cases) as I expect. But I would like to know this for each row and save in my variables data$y2017, data$y2018, data$y2019 etc. But I have trouble with this.

Because if I just try to do this on every row it doesn't work and just returns NA for each row. For example if I try to see who was there in 2016-12-31:

data$y2016 <- c(data$Start, data$End) %overlaps% dates[1]

How would I do this?


Solution

  • Here is a data.table approach

    Core of this solution is the data.table::foverlaps()-function to check is a Start-End interval in the DT.dates data.table is within the Start-End interval in DT.

    library(data.table)
    # Sample data
    DT <- fread("         People   Company  Start      End
            Person1  CompanyA 2017-04-27 2019-12-31
            Person2  CompanyB 2017-04-27 2019-08-30
            Person3  CompanyA 2013-04-27 2019-12-31
            Person4  CompanyB 2017-04-27 2019-12-31
            Person5  CompanyC 2016-05-13 2019-12-31")
    DT[, Start := as.Date(Start)]
    DT[, End := as.Date(End)]
    DT.dates <- data.table( Start = as.Date(c("2016-12-31","2017-12-31","2018-12-31","2019-12-31")),
                            End   = as.Date(c("2016-12-31","2017-12-31","2018-12-31","2019-12-31")))
    
    # Set keys
    setkey(DT, Start, End)
    setkey(DT.dates, Start, End)
    # Perform overlapjoin and cast to wide
    dcast(foverlaps(DT.dates, DT), People + Company + Start + End ~ i.Start, 
          value.var = "i.Start", fun.aggregate = length)
    #     People  Company      Start        End 2016-12-31 2017-12-31 2018-12-31 2019-12-31
    # 1: Person1 CompanyA 2017-04-27 2019-12-31          0          1          1          1
    # 2: Person2 CompanyB 2017-04-27 2019-08-30          0          1          1          0
    # 3: Person3 CompanyA 2013-04-27 2019-12-31          1          1          1          1
    # 4: Person4 CompanyB 2017-04-27 2019-12-31          0          1          1          1
    # 5: Person5 CompanyC 2016-05-13 2019-12-31          1          1          1          1