Search code examples
rdplyrdata.tablelubridate

Find the next available date when filtering a data.table


I need to filter a data.table for a certain date, namely always the 15th day of the month. In case this falls on a weekend the date won't be in my dataset. Then it should switch to the 16th or 17th, depending if the 15th is a Saturday or Sunday in the respective month.

library(data.table)
library(lubridate)

dt.test <- structure(list(Date = structure(c(18536, 18537, 18540, 18541, 
                                             18542, 18543, 18544, 18547, 18548, 18549, 18550, 18551, 18554, 
                                             18555, 18556, 18557, 18558, 18561, 18562, 18563, 18564, 18565, 
                                             18568, 18569, 18570, 18571, 18572, 18575, 18576, 18577, 18578, 
                                             18579, 18582, 18583, 18584, 18585, 18586, 18589, 18590, 18591, 
                                             18592, 18593, 18596, 18597, 18598, 18599, 18600, 18603, 18604, 
                                             18605, 18606, 18607, 18610, 18611, 18612, 18613, 18614, 18617, 
                                             18618, 18619, 18624, 18625, 18626), class = "Date")
                          , Week.Day = c(5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 
                                         2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 
                                         3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 2, 3, 4), 
                          A = 1:63), row.names = c(NA, -63L), class = c("data.table", 
                                                                        "data.frame"))




dt.test[day(Date) == 15]
Date          Week.Day  A
1: 2020-10-15        5 11
2: 2020-12-15        3 54

Expected output:

        Date  Week.Day  A
1: 2020-10-15        5 11
2: 2020-11-16        2 33
3: 2020-12-15        3 54

I could of course create multiple if-conditions to first filter out the months, when the 15th day is a weekend, but I am sure there's a more elegant data.table or dplyr function.


Solution

  • Here's how I would do it, using slice with which.min. You could also filter out any weekend dates, but it sounds like weekends won't appear within your data set anyway.

    dt.test %>%
        group_by(year(Date), month(Date)) %>%
        filter(day(Date) >= 15) %>%
        slice(which.min(day(Date) - 15))
    
      Date       Week.Day     A `year(Date)` `month(Date)`
      <date>        <dbl> <int>        <dbl>         <dbl>
    1 2020-10-15        5    11         2020            10
    2 2020-11-16        2    33         2020            11
    3 2020-12-15        3    54         2020            12
    

    And here's a data.table solution using similar logic

    dt.test[day(Date) >= 15, .SD[which.min(day(Date) - 15)], by = .(year(Date), month(Date))]