Search code examples
rdatetimelubridateposixct

Calculate part of duration that occur in each hour of day


I have a dataframe with start and end times:

  id          start_time            end_time
1  1 2018-09-02 11:13:00 2018-09-02 11:54:00
2  2 2018-09-02 14:34:00 2018-09-02 14:37:00
3  3 2018-09-02 03:00:00 2018-09-02 03:30:00
4  4 2018-09-02 03:49:00 2018-09-02 03:53:00
5  5 2018-09-02 07:05:00 2018-09-02 08:05:00
6  6 2018-09-02 06:44:00 2018-09-02 06:57:00
7  7 2018-09-02 06:04:00 2018-09-02 08:34:00
8  8 2018-09-02 07:51:00 2018-09-02 08:15:00
9  9 2018-09-02 08:16:00 2018-09-02 08:55:00

From such periods, how can I calculate the total number of minutes that occurred in each hour, each day? E.g. if a period started at 9:45 and ended at 10:15, I want to assign 15 minutes to the 9:00 hour and 15 minutes to the 10:00 hour.

Or checking the hour 06 in the data above, that hour is included in two different rows (periods):

6  6 2018-09-02 06:44:00 2018-09-02 06:57:00
7  7 2018-09-02 06:04:00 2018-09-02 08:34:00

In the first row, 13 minutes should be assigned to 06, and in the second row 56 minutes. Thus, a total of 69 minutes for the hour 06 that date.

Expected output from sample data:

  hourOfDay Day        totalMinutes
  <chr>     <chr>      <drtn>      
1 03        2018-09-02  34 mins    
2 06        2018-09-02  69 mins    
3 07        2018-09-02  124 mins    
4 08        2018-09-02  93 mins    
5 11        2018-09-02  41 mins    
6 14        2018-09-02   3 mins

My attempt: I couldn't make it with lubridate, then I found this old question here. I tried to use POSIXct, but the output is correct for some hours and incorrect for another hours. What am I missing here?

df %>% 
  mutate(minutes = difftime(end_time,start_time),
         hourOfDay = format(as.POSIXct(start_time), "%H"),
         Day = format(as.POSIXct(start_time),"%Y-%m-%d")) %>% 
  group_by(hourOfDay, Day) %>% 
  summarize(totalMinutes = sum(minutes))

Wrong output:

  hourOfDay Day        totalMinutes
  <chr>     <chr>      <drtn>      
1 03        2018-09-02  34 mins    
2 06        2018-09-02 163 mins    
3 07        2018-09-02  84 mins    
4 08        2018-09-02  39 mins    
5 11        2018-09-02  41 mins    
6 14        2018-09-02   3 mins

Sample data :

 df <- data.frame(
      id = c(1,2,3,4,5,6,7,8,9),
    start_time = c("2018-09-02 11:13:00", "2018-09-02 14:34:00",
                     "2018-09-02 03:00:00", "2018-09-02 03:49:00",
                     "2018-09-02 07:05:00", "2018-09-02 06:44:00", "2018-09-02 06:04:00",
                     "2018-09-02 07:51:00", "2018-09-02 08:16:00"),
    end_time = c("2018-09-02 11:54:00", "2018-09-02 14:37:00",
                   "2018-09-02 03:30:00", "2018-09-02 03:53:00",
                   "2018-09-02 08:05:00", "2018-09-02 06:57:00", "2018-09-02 08:34:00",
                   "2018-09-02 08:15:00", "2018-09-02 08:55:00"))

Solution

  • Here is an alternate solution, similar to Ronak's but without creating a minute-by-minute data-frame.

    library(dplyr)
    library(lubridate)
    
        df %>%
          mutate(hour = (purrr::map2(hour(start_time), hour(end_time), seq, by = 1))) %>%
          tidyr::unnest(hour)  %>% mutate(minu=case_when(hour(start_time)!=hour & hour(end_time)==hour ~ 1*minute(end_time),
                                     hour(start_time)==hour & hour(end_time)!=hour ~ 60-minute(start_time),
                                     hour(start_time)==hour & hour(end_time)==hour ~ 1*minute(end_time)-1*minute(start_time),
                                     TRUE ~ 60)) %>% group_by(hour) %>% summarise(sum(minu))
    
    # A tibble: 6 x 2
       hour `sum(minu)`
      <dbl>       <dbl>
    1     3          34
    2     6          69
    3     7         124
    4     8          93
    5    11          41
    6    14           3