Search code examples
rmappingtibble

Mapping tibble based on date, depending on date range


I have 2 tibbles. One is a list of stocks and an associated date. The other is a list of stocks and their sector, where the sector is define between 2 dates.

# stocks IDs and an associated date
stocks<-
  tibble(
    StockId = c(1, 1, 1, 2, 2),
    Date = c(
      as.Date("1998-01-05"),
      as.Date("2001-07-01"),
      as.Date("2015-01-01"),
      as.Date("1999-05-01"),
      as.Date("2003-02-02")
    )
  )

# stock IDs and their sector.
# For each stock, the sector has a start and end date as stocks change sector over time
sectors <- tibble(StockId = c(1, 1, 2, 2),
  SectorId = c(5050, 2540, 5510, 1010),
  StartDate = c(
    as.Date("1980-01-05"),
    as.Date("2000-07-01"),
    as.Date("1999-01-01"),
    as.Date("2010-01-02")
  ),
  EndDate = c(
    as.Date("2000-06-30"),
    as.Date("9999-12-31"),
    as.Date("2010-01-01"),
    as.Date("9999-01-01")
  )
)

I would like to get to a final tibble where I have stock, date and the sector for that particular date. In this case the result should be:

result <-
  tibble(
    StockId = c(1, 1, 1, 2, 2),
    Date = c(
      as.Date("1998-01-05"),
      as.Date("2001-07-01"),
      as.Date("2015-01-01"),
      as.Date("1999-05-01"),
      as.Date("2003-02-02")
    ),
    SectorId = c(5050, 2540, 2540, 5510, 5510)
  )

Note that a for-loop is not efficient here, as my tibble has much more data then this example.


Solution

  • You can also use dplyr, which includes a join_by function for overlap joins (since 1.1.0):

    #devtools::install_github("tidyverse/dplyr")
    library(dplyr)
    
    inner_join(stocks, sectors, by = join_by(StockId, between(Date, StartDate, EndDate)))
    
    # A tibble: 5 × 5
      StockId Date       SectorId StartDate  EndDate   
        <dbl> <date>        <dbl> <date>     <date>    
    1       1 1998-01-05     5050 1980-01-05 2000-06-30
    2       1 2001-07-01     2540 2000-07-01 9999-12-31
    3       1 2015-01-01     2540 2000-07-01 9999-12-31
    4       2 1999-05-01     5510 1999-01-01 2010-01-01
    5       2 2003-02-02     5510 1999-01-01 2010-01-01
    

    And the data.table option:

    library(data.table)
    setDT(stocks)[setDT(sectors), .(StockId, Date=x.Date, SectorId), 
           on=.(StockId, Date>=StartDate, Date<=EndDate), nomatch=0L]
    
       StockId       Date SectorId
    1:       1 2001-07-01     2540
    2:       1 2015-01-01     2540
    3:       1 1998-01-05     5050
    4:       2 1999-05-01     5510
    5:       2 2003-02-02     5510