I have a dataframe with start and end times:
id start_time end_time
1 1 2018-09-02 11:13:00 2018-09-02 11:54:00
2 2 2018-09-02 14:34:00 2018-09-02 14:37:00
3 3 2018-09-02 03:00:00 2018-09-02 03:30:00
4 4 2018-09-02 03:49:00 2018-09-02 03:53:00
5 5 2018-09-02 07:05:00 2018-09-02 08:05:00
6 6 2018-09-02 06:44:00 2018-09-02 06:57:00
7 7 2018-09-02 06:04:00 2018-09-02 08:34:00
8 8 2018-09-02 07:51:00 2018-09-02 08:15:00
9 9 2018-09-02 08:16:00 2018-09-02 08:55:00
From such periods, how can I calculate the total number of minutes that occurred in each hour, each day? E.g. if a period started at 9:45 and ended at 10:15, I want to assign 15 minutes to the 9:00 hour and 15 minutes to the 10:00 hour.
Or checking the hour 06
in the data above, that hour is included in two different rows (periods):
6 6 2018-09-02 06:44:00 2018-09-02 06:57:00
7 7 2018-09-02 06:04:00 2018-09-02 08:34:00
In the first row, 13 minutes should be assigned to 06
, and in the second row 56 minutes. Thus, a total of 69 minutes for the hour 06
that date.
Expected output from sample data:
hourOfDay Day totalMinutes
<chr> <chr> <drtn>
1 03 2018-09-02 34 mins
2 06 2018-09-02 69 mins
3 07 2018-09-02 124 mins
4 08 2018-09-02 93 mins
5 11 2018-09-02 41 mins
6 14 2018-09-02 3 mins
My attempt: I couldn't make it with lubridate
, then I found this old question here. I tried to use POSIXct
, but the output is correct for some hours and incorrect for another hours. What am I missing here?
df %>%
mutate(minutes = difftime(end_time,start_time),
hourOfDay = format(as.POSIXct(start_time), "%H"),
Day = format(as.POSIXct(start_time),"%Y-%m-%d")) %>%
group_by(hourOfDay, Day) %>%
summarize(totalMinutes = sum(minutes))
Wrong output:
hourOfDay Day totalMinutes
<chr> <chr> <drtn>
1 03 2018-09-02 34 mins
2 06 2018-09-02 163 mins
3 07 2018-09-02 84 mins
4 08 2018-09-02 39 mins
5 11 2018-09-02 41 mins
6 14 2018-09-02 3 mins
Sample data :
df <- data.frame(
id = c(1,2,3,4,5,6,7,8,9),
start_time = c("2018-09-02 11:13:00", "2018-09-02 14:34:00",
"2018-09-02 03:00:00", "2018-09-02 03:49:00",
"2018-09-02 07:05:00", "2018-09-02 06:44:00", "2018-09-02 06:04:00",
"2018-09-02 07:51:00", "2018-09-02 08:16:00"),
end_time = c("2018-09-02 11:54:00", "2018-09-02 14:37:00",
"2018-09-02 03:30:00", "2018-09-02 03:53:00",
"2018-09-02 08:05:00", "2018-09-02 06:57:00", "2018-09-02 08:34:00",
"2018-09-02 08:15:00", "2018-09-02 08:55:00"))
Here is an alternate solution, similar to Ronak's but without creating a minute-by-minute data-frame.
library(dplyr)
library(lubridate)
df %>%
mutate(hour = (purrr::map2(hour(start_time), hour(end_time), seq, by = 1))) %>%
tidyr::unnest(hour) %>% mutate(minu=case_when(hour(start_time)!=hour & hour(end_time)==hour ~ 1*minute(end_time),
hour(start_time)==hour & hour(end_time)!=hour ~ 60-minute(start_time),
hour(start_time)==hour & hour(end_time)==hour ~ 1*minute(end_time)-1*minute(start_time),
TRUE ~ 60)) %>% group_by(hour) %>% summarise(sum(minu))
# A tibble: 6 x 2
hour `sum(minu)`
<dbl> <dbl>
1 3 34
2 6 69
3 7 124
4 8 93
5 11 41
6 14 3