Search code examples
rdplyrtidyverse

Check if a numeric interval is included in another numeric interval in R


I am working with a dataframe similar to the one below:

Start_date  End_date  Start_time_window  End_time_window
10          15        12                 14
11          16        20                 22
15          20        14                 19
14          21        16                 22

I need to create a new column (e.g. Available) that is populated with yes/no (or TRUE/FALSE, I don't mind) when these intervals overlap completely or partially. So I should get something like this:

Start_date  End_date  Start_time_window  End_time_window  Available
10          15        12                 14               Yes
11          16        20                 22               No
15          20        14                 19               Yes
14          21        16                 22               Yes

I have tried dplyr and DescTools solutions like:

new_df <- df %>% mutate(Available = if_else(
   condition = Start_time_window >= Start_date &          
   Start_time_window <= End_date & 
   End_time_window >= Start_date &          
   End_time_window <= End_date,
       true = "Yes",
       false = "No"
))

or

new_df <- df %>% mutate(Available = c(Start_time_window, End_time_window) %overlaps%                       
                                    c(Start_date, End_date))

These solutions run but do not provide the correct results.

Any ideas? Thanks!


Solution

  • Here's one way of doing this:

    dplyr::mutate(df, Available = between(Start_date, Start_time_window, End_time_window) | 
                           between(End_date, Start_time_window, End_time_window) |
                           between(Start_time_window, Start_date, End_date))
    

    The reason this works is, you have four things you're looking for:

    1. Times where the period overlaps the start of the window, but not the end
    2. Same as above, but with the end of the window
    3. Both of the above at the same time
    4. The window is within the time period entirely

    The first two in the code get the first two in the list, and the third one for free. The fourth is captured by the third in the code.

    Output:

      Start_date End_date Start_time_window End_time_window Available
    1         10       15                12              14      TRUE
    2         11       16                20              22     FALSE
    3         15       20                14              19      TRUE
    4         14       21                16              22      TRUE
    

    Update:

    The reason your code doesn't work is, well, going through it line by line:

    Start_time_window >= Start_date &          
                               Start_time_window <= End_date
    

    This is equivalent to Start_date <= Start_time_window <= End_date, in other words, the start_time_window is between the start and end date. But for how many rows is this actually the case? Only two: the first and the last, the third row's start value isn't between them.

    End_time_window >= Start_date &          
                               End_time_window <= End_date,
    

    This is the same as the previous one, but with the End_time_window instead. This one is true for the first and third rows.

    Since the only row where both of these parts is true is the first row, that is the only one which evaluates to true. Every other row is false.