Issue with dplyr and cut for time intervals

I am trying to split by data into 5 sec time intervals and group them using dplyr.
Below is my original data - I have date and time in separate columns which I have later combined using Posixct

structure(list(Date = c("10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013", "10/30/2013"), Time = c("20:06:57", "20:07:13", "20:07:25", "20:07:30", "20:08:16", "20:08:17", "20:08:26", "20:09:05", "20:09:06", "20:09:07", "20:09:37", "20:09:38", "20:09:55", "20:12:34", "20:14:15"), ID = c("M1", "M1", "M1", "M3", "M1", "M1", "M8", "M9", "M9", "M9", "M1", "M1", "M1", "M5", "M1")), .Names = c("Date", "Time", "ID"), class = "data.frame", row.names = c(NA, -15L))

Attaching my code below

data$datetime <- as.POSIXct(paste(data$Date, data$Time), format="%m/%d/%Y %H:%M:%S") 
data_order <- data %>%  arrange(datetime,ID)      
data_order$group <-  data_order  %>% group_by(by5sec=cut(datetime, breaks= "5 secs",right =T),ID) %>% group_indices()

While some observations are grouped right , some are wrong. I have tried 2 versions - removing the "right=T" and keeping it and I got different groups , but had errors in both versions. I have also tried using as.numeric,as.posixct et all before the cut in vain

Attaching the output for both versions.Red ones are wrongly coded as 2 different groups.

****Version 1 "right = T" for cut****

****Version 2 "right = F" for cut****

Could someone please help in solving this,I have spent quite some time and its been a goose chase given my knowledge of R . All I want is 5 sec breaks for the same ID(group should change for a new ID).

Desired output

Solution

I'm not entirely clear on the output images you show. Based on your problem description, how about something like this?

library(tidyverse);
df %>%
    unite(datetime, 1:2, sep = " ", remove = FALSE) %>%
    mutate(
        datetime = as.POSIXct(datetime, format = "%m/%d/%Y %H:%M:%S"),
        datetime.by5sec = as.numeric(cut(datetime, "sec")) %/% 5 + 1);
#        datetime       Date     Time ID datetime.by5sec
#1  2013-10-30 20:06:57 10/30/2013 20:06:57 M1               1
#2  2013-10-30 20:07:13 10/30/2013 20:07:13 M1               4
#3  2013-10-30 20:07:25 10/30/2013 20:07:25 M1               6
#4  2013-10-30 20:07:30 10/30/2013 20:07:30 M3               7
#5  2013-10-30 20:08:16 10/30/2013 20:08:16 M1              17
#6  2013-10-30 20:08:17 10/30/2013 20:08:17 M1              17
#7  2013-10-30 20:08:26 10/30/2013 20:08:26 M8              19
#8  2013-10-30 20:09:05 10/30/2013 20:09:05 M9              26
#9  2013-10-30 20:09:06 10/30/2013 20:09:06 M9              27
#10 2013-10-30 20:09:07 10/30/2013 20:09:07 M9              27
#11 2013-10-30 20:09:37 10/30/2013 20:09:37 M1              33
#12 2013-10-30 20:09:38 10/30/2013 20:09:38 M1              33
#13 2013-10-30 20:09:55 10/30/2013 20:09:55 M1              36
#14 2013-10-30 20:12:34 10/30/2013 20:12:34 M5              68
#15 2013-10-30 20:14:15 10/30/2013 20:14:15 M1              88

Explanation: datetime.by5sec gives the 5 sec bin index into which datetime falls. So the first entry sits in bin 1. The second entry is within the 4th 5 sec bin, i.e. within 20 seconds from the first entry, and so on. Here I made use of integer division %/% 5, since cut.POSIXct only allows you to bin by second as interval.

Update

The following reproduces your expected output:

df %>%
    unite(datetime, 1:2, sep = " ", remove = FALSE) %>%
    group_by(ID) %>%
    mutate(
        datetime = as.POSIXct(datetime, format = "%m/%d/%Y %H:%M:%S"),
        difftime = difftime(datetime, lag(datetime, default = 0))) %>%
    ungroup() %>%
    mutate(
        group = cumsum(abs(difftime) >= 5)) %>%
    select(Date, Time, ID, datetime, group);
## A tibble: 15 x 5
#   Date       Time     ID    datetime            group
#   <chr>      <chr>    <chr> <dttm>              <int>
# 1 10/30/2013 20:06:57 M1    2013-10-30 20:06:57     1
# 2 10/30/2013 20:07:13 M1    2013-10-30 20:07:13     2
# 3 10/30/2013 20:07:25 M1    2013-10-30 20:07:25     3
# 4 10/30/2013 20:07:30 M3    2013-10-30 20:07:30     4
# 5 10/30/2013 20:08:16 M1    2013-10-30 20:08:16     5
# 6 10/30/2013 20:08:17 M1    2013-10-30 20:08:17     5
# 7 10/30/2013 20:08:26 M8    2013-10-30 20:08:26     6
# 8 10/30/2013 20:09:05 M9    2013-10-30 20:09:05     7
# 9 10/30/2013 20:09:06 M9    2013-10-30 20:09:06     7
#10 10/30/2013 20:09:07 M9    2013-10-30 20:09:07     7
#11 10/30/2013 20:09:37 M1    2013-10-30 20:09:37     8
#12 10/30/2013 20:09:38 M1    2013-10-30 20:09:38     8
#13 10/30/2013 20:09:55 M1    2013-10-30 20:09:55     9
#14 10/30/2013 20:12:34 M5    2013-10-30 20:12:34    10
#15 10/30/2013 20:14:15 M1    2013-10-30 20:14:15    11

Explanation: Calculate the time difference between two successive datetime entries, grouped by ID; group is then the cumulative sum of all time differences >=5.