Search code examples
rdplyrintervalslubridate

Checking that a series of dates are within a series of different intervals


This seems like it would be a simple thing to do, but I am stumped.

I was using tidyverse material as a guide: here

I have a list of recession time periods, and I want to create a data frame as an output that lists every date and whether or not that date is in a recession. I would like to keep the solution in dplyr format.

Here is a reproducible example

library(lubridate)
library(tidyverse)

# Sample data set

my_df <-
structure(list(recession_start = structure(c(1400, 3652, 4199, 
7486, 11382, 13848), class = "Date"), recession_end = structure(c(1885, 
3834, 4687, 7729, 11627, 14396), class = "Date"), recession_interval = new("Interval", 
    .Data = c(41904000, 15724800, 42163200, 20995200, 21168000, 
    47347200), start = structure(c(120960000, 315532800, 362793600, 
    646790400, 983404800, 1196467200), tzone = "UTC", class = c("POSIXct", 
    "POSIXt")), tzone = "UTC")), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))


    > my_df
# A tibble: 6 x 3
  recession_start recession_end recession_interval            
  <date>          <date>        <Interval>                    
1 1973-11-01      1975-03-01    1973-11-01 UTC--1975-03-01 UTC
2 1980-01-01      1980-07-01    1980-01-01 UTC--1980-07-01 UTC
3 1981-07-01      1982-11-01    1981-07-01 UTC--1982-11-01 UTC
4 1990-07-01      1991-03-01    1990-07-01 UTC--1991-03-01 UTC
5 2001-03-01      2001-11-01    2001-03-01 UTC--2001-11-01 UTC
6 2007-12-01      2009-06-01    2007-12-01 UTC--2009-06-01 UTC



# Get every day in the range of dates
my_dates <- seq(first(my_df$recession_start), today(), by = "day")


# Create a list of intervals
recession_intervals <- list(my_df$recession_interval)


# Check to see if `my_dates` are in the intervals
recession <- my_dates %within% recession_intervals  # Throws warning and does not give expected results

I suspect this is because my list of dates is a single list vs. multiple lists as in the tidyverse example, but I'm not sure how to create multiple lists other than manually.

The desired output would be a data frame with each date and a "TRUE" or "FALSE" column indicating if that daily date is in a recession interval. Something like:

recession_df <- data.frame(Date = my_dates, recession = recession) 

Output would look like this:

         Date recession
1  1973-11-01      TRUE
2  1973-11-02      TRUE
3  1973-11-03      TRUE
4  1973-11-04      TRUE
5  1973-11-05      TRUE
6  1973-11-06      TRUE
7  1973-11-07      TRUE
8  1973-11-08      TRUE
9  1973-11-09      TRUE
10 1973-11-10      TRUE

Thanks for any help!


Solution

  • One option is to loop over (map) the 'my_dates', check if there are any dates that are %within% the 'recession_interval column, create a tibble with each 'date' and the logical output and convert to a single dataset with _dfr (row binding)

    library(purrr)
    out <- map_dfr(my_dates, ~ tibble(Date = .x, 
         recession = any(Date %within% my_df$recession_interval)))
    

    -output

    # A tibble: 17,381 x 2
       Date       recession
       <date>     <lgl>    
     1 1973-11-01 TRUE     
     2 1973-11-02 TRUE     
     3 1973-11-03 TRUE     
     4 1973-11-04 TRUE     
     5 1973-11-05 TRUE     
     6 1973-11-06 TRUE     
     7 1973-11-07 TRUE     
     8 1973-11-08 TRUE     
     9 1973-11-09 TRUE     
    10 1973-11-10 TRUE     
    # … with 17,371 more rows