Search code examples
rdatetimedplyrlubridate

Creating offset intervals for days with dplyr and lurbridate


I am attempting to create a "24-hour" day which begins at 07:00:00 and ends at 06:59:59 using dplyr and lubridate, which I can use to capture the overlap of a Platoon with a day. I have tried several approaches, by trying to group and using int_diff, floor_date() + 24, and I'm struggling to get this new variable to even work. For example, I would need 2020-01-01 10:00:00 and 2020-01-02 05:47:49 to both identify with "Day 1", but 2020-01-02 07:00:01 to identify as "Day 2", and so forth.

df_ex
   platoon_id           disp_time
1   PLATOON 1 2020-01-01 10:06:48
2   PLATOON 1 2020-01-01 12:56:57
3   PLATOON 2 2020-01-02 07:10:30
4   PLATOON 2 2020-01-02 09:31:28
5   PLATOON 2 2020-01-02 09:45:00
6   PLATOON 2 2020-01-02 10:11:58
7   PLATOON 2 2020-01-02 10:59:09
8   PLATOON 2 2020-01-02 14:56:57
9   PLATOON 2 2020-01-03 07:45:51
10  PLATOON 3 2020-01-03 09:20:35
11  PLATOON 3 2020-01-03 10:12:29
12  PLATOON 3 2020-01-03 10:54:31
13  PLATOON 3 2020-01-03 12:55:40
14  PLATOON 3 2020-01-03 15:19:03
15  PLATOON 3 2020-01-03 16:11:51
16  PLATOON 3 2020-01-03 18:15:51
17  PLATOON 3 2020-01-03 20:39:32
18  PLATOON 3 2020-01-03 21:26:53
19  PLATOON 3 2020-01-04 03:11:38
20  PLATOON 3 2020-01-04 06:48:16
21  PLATOON 4 2020-01-04 10:27:57
22  PLATOON 4 2020-01-04 10:43:37
23  PLATOON 4 2020-01-04 19:53:20
24  PLATOON 4 2020-01-05 03:24:08
25  PLATOON 4 2020-01-05 04:22:13

Any help would be greatly appreciated!


Solution

  • library(magrittr)
    df_ex %>%
      dplyr::mutate(day_number = lubridate::yday(disp_time) - (lubridate::hour(disp_time) < 7))
    
    

    I think that the above code gives you a new variable, day_number that corresponds to the day number that you want.

    First, I use load package magrittr, so that I can use the pipe, %>%. Then, I "pipe" your data frame to the function mutate (which is in the dplyr package). mutate takes an existing data frame and creates a new variable, in this case, day_number, that is defined by the right-hand side of the equality. If we just wanted the number (in the year) for each day, then we would stop with that. However, you want the 7-hour offset. In other words, 6am on Jan 2 should return day 1, while 8am Jan 2 should return day 2. More exactly, any time less than 7 am on day X should return day X-1. The parenthetical on the far right-hand side, (lubridate::hour(disp_time) < 7) returns a TRUE or FALSE depending on the truth of the assertion, is, ie, is the time of day less than 7am. R then coerces TRUE (or FALSE) to 1 (or 0) and subtracts that quantity from the first part of the right-hand side, lubridate::yday(disp_time).

    The :: may be foreign to some readers. It allows me to call exported functions from a namespace (or package). So, lubridate::yday refers to the function yday in the package lubridate.

    The pipe, %>%, I find particularly useful when working with data frames. You can read more about it in "R for data science", a free online book: https://r4ds.had.co.nz/