I am working with a dataframe similar to the one below:
Start_date End_date Start_time_window End_time_window
10 15 12 14
11 16 20 22
15 20 14 19
14 21 16 22
I need to create a new column (e.g. Available) that is populated with yes/no (or TRUE/FALSE, I don't mind) when these intervals overlap completely or partially. So I should get something like this:
Start_date End_date Start_time_window End_time_window Available
10 15 12 14 Yes
11 16 20 22 No
15 20 14 19 Yes
14 21 16 22 Yes
I have tried dplyr and DescTools solutions like:
new_df <- df %>% mutate(Available = if_else(
condition = Start_time_window >= Start_date &
Start_time_window <= End_date &
End_time_window >= Start_date &
End_time_window <= End_date,
true = "Yes",
false = "No"
))
or
new_df <- df %>% mutate(Available = c(Start_time_window, End_time_window) %overlaps%
c(Start_date, End_date))
These solutions run but do not provide the correct results.
Any ideas? Thanks!
Here's one way of doing this:
dplyr::mutate(df, Available = between(Start_date, Start_time_window, End_time_window) |
between(End_date, Start_time_window, End_time_window) |
between(Start_time_window, Start_date, End_date))
The reason this works is, you have four things you're looking for:
The first two in the code get the first two in the list, and the third one for free. The fourth is captured by the third in the code.
Output:
Start_date End_date Start_time_window End_time_window Available
1 10 15 12 14 TRUE
2 11 16 20 22 FALSE
3 15 20 14 19 TRUE
4 14 21 16 22 TRUE
Update:
The reason your code doesn't work is, well, going through it line by line:
Start_time_window >= Start_date &
Start_time_window <= End_date
This is equivalent to Start_date <= Start_time_window <= End_date, in other words, the start_time_window is between the start and end date. But for how many rows is this actually the case? Only two: the first and the last, the third row's start value isn't between them.
End_time_window >= Start_date &
End_time_window <= End_date,
This is the same as the previous one, but with the End_time_window instead. This one is true for the first and third rows.
Since the only row where both of these parts is true is the first row, that is the only one which evaluates to true. Every other row is false.