Search code examples
rdatetimelubridatetimetable

How to break down datetime object or interval object into minute by minute per row in R


I have a dataset with a column datetime (start) and datetime_end. After data manipulation I want to break down this interval by minute per row - let's say if I have this interval

datetime                datetime_end          id   disc
2019-03-19 12:47:28     2019-03-19 12:50:37   5-3 start

I would like to break it down by minutes to have something like this :

    datetime                  id   disc
2019-03-19 12:48:00           5-3 start
2019-03-19 12:49:00           5-3 start
2019-03-19 12:50:00           5-3 start
2019-03-19 12:51:00           5-3 start

Here is the dummy dataframe

df1 <- data.frame(stringsAsFactors=FALSE,
                  datetime = c("2019-03-19T13:26:52Z", "2019-03-19T13:26:19Z",
                               "2019-03-19T13:23:46Z", "2019-03-19T13:22:20Z",
                               "2019-03-19T13:09:56Z", "2019-03-19T13:06:04Z", "2019-03-19T13:05:21Z",
                               "2019-03-19T13:04:37Z", "2019-03-19T12:47:28Z",
                               "2019-03-19T12:46:42Z"),
                  id = c("5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3",
                         "5-3"),
                  disc = c("car", "stop", "start", "stop", "start", "stop", "start",
                           "stop", "start", "stop")
)

I tried to use lubridate::interval function to make a interval object (travel interval) but I am struggling to break it down by minute per row (as I showed above). So, if someone knows a solution I would really appreciate that.

Here is my script

library(tidyverse)
library(lubridate)
  df <- df1 %>% 
    mutate(datetime = lubridate::as_datetime(datetime)) %>% 
    arrange(datetime) %>% 
    mutate(datetime_end = lead(datetime), 
           # Create an interval object.
           Travel_Interval = 
             lubridate::interval(start = datetime, end = datetime_end)) %>% 
    filter(!is.na(Travel_Interval)) %>% 
    # select(-Travel_Interval)
    select(datetime,datetime_end , id , disc,Travel_Interval) %>% 
    filter(disc == "start")

Solution

  • I'd use purrr::map2() for this:

    # take df1 %>% mutate datetime column to datetime format %>% sort by datetime
    # %>% add datetime_end as lead of datetime %>% filter out records with no
    # recorded datetime_end %>% mutate to create column 'minute' by using
    # purrr::map2 to iterate over each datetime and datetime_end pair and apply the
    # following function {create an sequence of datestamps starting at the "minute
    # ceiling" of 'start'datetime' and ending at the "minute ceiling" of
    # 'datetime_end in one minute intervals} %>% since the resultant column is a
    # list, we have to unnest the data
    df <- df1 %>% 
      mutate(datetime = as_datetime(datetime)) %>% 
      arrange(datetime) %>% 
      mutate(datetime_end = lead(datetime, n = 1L)) %>% 
      filter(!is.na(datetime_end)) %>% 
      mutate(minute = purrr::map2(datetime, datetime_end, function(start, stop) {
        seq.POSIXt(from = ceiling_date(start, 'minute'), to = ceiling_date(stop, 'minute'), by = 'min')
      })) %>% 
      unnest()
    

    Note, however, that since you are effectively cutting timestamps into minute intervals using some form of rounding (taking the ceiling, in this instance), you're going to have to decide how to deal with boundary cases. E.g.: the first run of disc == "stop" will have its last row end with minute == 2019-03-19 12:48:00, but the first row of the subsequent disc == "start" _run" will have its first row start with minute == 2019-03-19 12:48:00 as well:

                  datetime  id  disc        datetime_end              minute
    1  2019-03-19 12:46:42 5-3  stop 2019-03-19 12:47:28 2019-03-19 12:47:00
    2  2019-03-19 12:46:42 5-3  stop 2019-03-19 12:47:28 2019-03-19 12:48:00
    3  2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:48:00
    4  2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:49:00
    5  2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:50:00
    6  2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:51:00
    7  2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:52:00
    8  2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:53:00
    9  2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:54:00
    10 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:55:00
    11 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:56:00
    12 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:57:00
    13 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:58:00
    14 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:59:00
    15 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:00:00
    16 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:01:00
    17 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:02:00
    18 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:03:00
    19 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:04:00
    20 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:05:00
    21 2019-03-19 13:04:37 5-3  stop 2019-03-19 13:05:21 2019-03-19 13:05:00
    22 2019-03-19 13:04:37 5-3  stop 2019-03-19 13:05:21 2019-03-19 13:06:00