I am attempting to create a "24-hour" day which begins at 07:00:00 and ends at 06:59:59 using dplyr and lubridate, which I can use to capture the overlap of a Platoon with a day. I have tried several approaches, by trying to group and using int_diff
, floor_date() + 24
, and I'm struggling to get this new variable to even work. For example, I would need 2020-01-01 10:00:00
and 2020-01-02 05:47:49
to both identify with "Day 1", but 2020-01-02 07:00:01
to identify as "Day 2", and so forth.
df_ex
platoon_id disp_time
1 PLATOON 1 2020-01-01 10:06:48
2 PLATOON 1 2020-01-01 12:56:57
3 PLATOON 2 2020-01-02 07:10:30
4 PLATOON 2 2020-01-02 09:31:28
5 PLATOON 2 2020-01-02 09:45:00
6 PLATOON 2 2020-01-02 10:11:58
7 PLATOON 2 2020-01-02 10:59:09
8 PLATOON 2 2020-01-02 14:56:57
9 PLATOON 2 2020-01-03 07:45:51
10 PLATOON 3 2020-01-03 09:20:35
11 PLATOON 3 2020-01-03 10:12:29
12 PLATOON 3 2020-01-03 10:54:31
13 PLATOON 3 2020-01-03 12:55:40
14 PLATOON 3 2020-01-03 15:19:03
15 PLATOON 3 2020-01-03 16:11:51
16 PLATOON 3 2020-01-03 18:15:51
17 PLATOON 3 2020-01-03 20:39:32
18 PLATOON 3 2020-01-03 21:26:53
19 PLATOON 3 2020-01-04 03:11:38
20 PLATOON 3 2020-01-04 06:48:16
21 PLATOON 4 2020-01-04 10:27:57
22 PLATOON 4 2020-01-04 10:43:37
23 PLATOON 4 2020-01-04 19:53:20
24 PLATOON 4 2020-01-05 03:24:08
25 PLATOON 4 2020-01-05 04:22:13
Any help would be greatly appreciated!
library(magrittr)
df_ex %>%
dplyr::mutate(day_number = lubridate::yday(disp_time) - (lubridate::hour(disp_time) < 7))
I think that the above code gives you a new variable, day_number
that corresponds to the day number that you want.
First, I use load package magrittr
, so that I can use the pipe, %>%
. Then, I "pipe" your data frame to the function mutate
(which is in the dplyr
package). mutate
takes an existing data frame and creates a new variable, in this case, day_number
, that is defined by the right-hand side of the equality. If we just wanted the number (in the year) for each day, then we would stop with that. However, you want the 7-hour offset. In other words, 6am on Jan 2 should return day 1, while 8am Jan 2 should return day 2. More exactly, any time less than 7 am on day X should return day X-1. The parenthetical on the far right-hand side, (lubridate::hour(disp_time) < 7)
returns a TRUE or FALSE depending on the truth of the assertion, is, ie, is the time of day less than 7am. R then coerces TRUE (or FALSE) to 1 (or 0) and subtracts that quantity from the first part of the right-hand side, lubridate::yday(disp_time)
.
The ::
may be foreign to some readers. It allows me to call exported functions from a namespace (or package). So, lubridate::yday
refers to the function yday
in the package lubridate
.
The pipe, %>%
, I find particularly useful when working with data frames. You can read more about it in "R for data science", a free online book: https://r4ds.had.co.nz/