I'm trying to make a new variable in my dataframe based on date overlapping.
My data
looks like this
People Company Start End
1 Person1 CompanyA 2017-04-27 2019-12-31
2 Person2 CompanyB 2017-04-27 2019-08-30
3 Person3 CompanyA 2013-04-27 2019-12-31
4 Person4 CompanyB 2017-04-27 2019-12-31
5 Person5 CompanyC 2016-05-13 2019-12-31
I would like to check if each person was in the company at the following dates
2016-12-31
2017-12-31
2018-12-31
2019-12-31
...
I'm using the %overlaps%
from the DescTools
library. And if I check just one row in my data
c(data$Start[1], data$End[1]) %overlaps% dates[1]
it returns TRUE
(or FALSE
in other cases) as I expect. But I would like to know this for each row and save in my variables data$y2017
, data$y2018
, data$y2019
etc. But I have trouble with this.
Because if I just try to do this on every row it doesn't work and just returns NA
for each row. For example if I try to see who was there in 2016-12-31
:
data$y2016 <- c(data$Start, data$End) %overlaps% dates[1]
How would I do this?
Here is a data.table
approach
Core of this solution is the data.table::foverlaps()
-function to check is a Start-End interval in the DT.dates
data.table is within the Start-End interval in DT
.
library(data.table)
# Sample data
DT <- fread(" People Company Start End
Person1 CompanyA 2017-04-27 2019-12-31
Person2 CompanyB 2017-04-27 2019-08-30
Person3 CompanyA 2013-04-27 2019-12-31
Person4 CompanyB 2017-04-27 2019-12-31
Person5 CompanyC 2016-05-13 2019-12-31")
DT[, Start := as.Date(Start)]
DT[, End := as.Date(End)]
DT.dates <- data.table( Start = as.Date(c("2016-12-31","2017-12-31","2018-12-31","2019-12-31")),
End = as.Date(c("2016-12-31","2017-12-31","2018-12-31","2019-12-31")))
# Set keys
setkey(DT, Start, End)
setkey(DT.dates, Start, End)
# Perform overlapjoin and cast to wide
dcast(foverlaps(DT.dates, DT), People + Company + Start + End ~ i.Start,
value.var = "i.Start", fun.aggregate = length)
# People Company Start End 2016-12-31 2017-12-31 2018-12-31 2019-12-31
# 1: Person1 CompanyA 2017-04-27 2019-12-31 0 1 1 1
# 2: Person2 CompanyB 2017-04-27 2019-08-30 0 1 1 0
# 3: Person3 CompanyA 2013-04-27 2019-12-31 1 1 1 1
# 4: Person4 CompanyB 2017-04-27 2019-12-31 0 1 1 1
# 5: Person5 CompanyC 2016-05-13 2019-12-31 1 1 1 1