Search code examples
rdata.tablerbind

How to row bind all cases of a particular weekday in a given year-month into an R dataset


I have data that includes a date and day of week.

  • I would like to identify all instances of a particular weekday that match the given year/month/weekday in the original data. For instance if the first record has the date "2010-07-05" which is a Thursday, I want to rowbind all Thursdays that occur in July of 2010 to my original dataset.
  • While adding those new rows, I also want to fill in those new rows with values from the original data for all columns, except one. The exception is a variable which indicates whether or not that row was in the original dataset or not.

Example data:

(1) alldays -- this data includes all dates and weekdays for the appropriate years.

(2) dt1 -- this is the example dataset that includes the date Adate, and day of week dow that will be used to identify the year/month/weekday and then look for all dates within that same month for the given weekday. For example - all Thursdays in July of 2017 will need to row bound to the original data.

library(data.table)
library(tidyverse)
library(lubridate)

alldays <- data.table (date = seq(as.Date("2010-01-01"),
                                  as.Date("2011-12-31"), by="days"))
alldays <- alldays %>%
  dplyr::mutate(year = lubridate::year(date), 
                month = lubridate::month(date), 
                day = lubridate::day(date),
                dow = weekdays(date))
setDT(alldays)
head(alldays)
        date year month day       dow
1 2010-01-01 2010     1   1    Friday
2 2010-01-02 2010     1   2  Saturday
3 2010-01-03 2010     1   3    Sunday
4 2010-01-04 2010     1   4    Monday
5 2010-01-05 2010     1   5   Tuesday
6 2010-01-06 2010     1   6 Wednesday

Here is an example of the primary dataset

id <- seq(1:2)
admit <- rep(1,2)
zip <- c(54123, 54789)
Adate <- as.Date(c("2010-07-15","2011-03-14"))
year <- c(2010, 2011)
month <- c(7,3)
day <- c(15,14)
dow <- c("Thursday","Monday")
dt1 <- data.table(id, admit, zip, Adate, year, month, day, dow)
dt1
#>    id admit   zip      Adate year month day      dow
#> 1:  1     1 54123 2010-07-15 2010     7  15 Thursday
#> 2:  2     1 54789 2011-03-14 2011     3  14   Monday

The resulting dataset should be:

   id admit   zip      Adate year month day      dow
1:  1     0 54123 2010-07-01 2010     7   1 Thursday
2:  1     0 54123 2010-07-08 2010     7   8 Thursday
3:  1     1 54123 2010-07-15 2010     7  15 Thursday
4:  1     0 54123 2010-07-22 2010     7  22 Thursday
5:  1     0 54123 2010-07-29 2010     7  29 Thursday
6:  2     0 54789 2011-03-07 2011     3   7   Monday
7:  2     1 54789 2011-03-14 2011     3  14   Monday
8:  2     0 54789 2011-03-21 2011     3  21   Monday
9:  2     0 54789 2011-03-28 2011     3  28   Monday

So we can see that the first date dt1 2010-07-15 associated with id=1, which was a Thursday fell within a month with 4 additional Thursday in that month which were added to the dataset. The variable admit is the indicator of whether that row was in the original or subsequently added by virtue of the being matched.

I have tried first selecting the additional dates from alldays with matching weekdays but I am running into issues on how to rowbind those back into the original dataset while filling in the other values appropriately. Eventually I will be running this on a dataset with about 300,000 rows.


Solution

  • Here is an option:

    alldays[dt1[, .(id, zip, admit=0L, year, month, dow)], 
        on=.(year, month, dow), allow.cartesian=TRUE][
            dt1, on=.(id, date=Adate), admit := i.admit][]
    

    output:

             date year month day      dow id   zip admit
    1: 2010-07-01 2010     7   1 Thursday  1 54123     0
    2: 2010-07-08 2010     7   8 Thursday  1 54123     0
    3: 2010-07-15 2010     7  15 Thursday  1 54123     1
    4: 2010-07-22 2010     7  22 Thursday  1 54123     0
    5: 2010-07-29 2010     7  29 Thursday  1 54123     0
    6: 2011-03-07 2011     3   7   Monday  2 54789     0
    7: 2011-03-14 2011     3  14   Monday  2 54789     1
    8: 2011-03-21 2011     3  21   Monday  2 54789     0
    9: 2011-03-28 2011     3  28   Monday  2 54789     0