I have data that includes a date and day of week.
Example data:
(1) alldays -- this data includes all dates and weekdays for the appropriate years.
(2) dt1 -- this is the example dataset that includes the date Adate
, and day of week dow
that will be used to identify the year/month/weekday and then look for all dates within that same month for the given weekday. For example - all Thursdays in July of 2017 will need to row bound to the original data.
library(data.table)
library(tidyverse)
library(lubridate)
alldays <- data.table (date = seq(as.Date("2010-01-01"),
as.Date("2011-12-31"), by="days"))
alldays <- alldays %>%
dplyr::mutate(year = lubridate::year(date),
month = lubridate::month(date),
day = lubridate::day(date),
dow = weekdays(date))
setDT(alldays)
head(alldays)
date year month day dow
1 2010-01-01 2010 1 1 Friday
2 2010-01-02 2010 1 2 Saturday
3 2010-01-03 2010 1 3 Sunday
4 2010-01-04 2010 1 4 Monday
5 2010-01-05 2010 1 5 Tuesday
6 2010-01-06 2010 1 6 Wednesday
Here is an example of the primary dataset
id <- seq(1:2)
admit <- rep(1,2)
zip <- c(54123, 54789)
Adate <- as.Date(c("2010-07-15","2011-03-14"))
year <- c(2010, 2011)
month <- c(7,3)
day <- c(15,14)
dow <- c("Thursday","Monday")
dt1 <- data.table(id, admit, zip, Adate, year, month, day, dow)
dt1
#> id admit zip Adate year month day dow
#> 1: 1 1 54123 2010-07-15 2010 7 15 Thursday
#> 2: 2 1 54789 2011-03-14 2011 3 14 Monday
The resulting dataset should be:
id admit zip Adate year month day dow
1: 1 0 54123 2010-07-01 2010 7 1 Thursday
2: 1 0 54123 2010-07-08 2010 7 8 Thursday
3: 1 1 54123 2010-07-15 2010 7 15 Thursday
4: 1 0 54123 2010-07-22 2010 7 22 Thursday
5: 1 0 54123 2010-07-29 2010 7 29 Thursday
6: 2 0 54789 2011-03-07 2011 3 7 Monday
7: 2 1 54789 2011-03-14 2011 3 14 Monday
8: 2 0 54789 2011-03-21 2011 3 21 Monday
9: 2 0 54789 2011-03-28 2011 3 28 Monday
So we can see that the first date dt1
2010-07-15 associated with id=1, which was a Thursday fell within a month with 4 additional Thursday in that month which were added to the dataset. The variable admit
is the indicator of whether that row was in the original or subsequently added by virtue of the being matched.
I have tried first selecting the additional dates from alldays
with matching weekdays but I am running into issues on how to rowbind those back into the original dataset while filling in the other values appropriately. Eventually I will be running this on a dataset with about 300,000 rows.
Here is an option:
alldays[dt1[, .(id, zip, admit=0L, year, month, dow)],
on=.(year, month, dow), allow.cartesian=TRUE][
dt1, on=.(id, date=Adate), admit := i.admit][]
output:
date year month day dow id zip admit
1: 2010-07-01 2010 7 1 Thursday 1 54123 0
2: 2010-07-08 2010 7 8 Thursday 1 54123 0
3: 2010-07-15 2010 7 15 Thursday 1 54123 1
4: 2010-07-22 2010 7 22 Thursday 1 54123 0
5: 2010-07-29 2010 7 29 Thursday 1 54123 0
6: 2011-03-07 2011 3 7 Monday 2 54789 0
7: 2011-03-14 2011 3 14 Monday 2 54789 1
8: 2011-03-21 2011 3 21 Monday 2 54789 0
9: 2011-03-28 2011 3 28 Monday 2 54789 0