Search code examples
rdatetimeposixct

Splitting a dateTime vector if time is greater than x between vector components


I have the following data:

df <- data.frame(index = 1:85,
                 times = c(seq(as.POSIXct("2020-10-03 21:31:00 UTC"),
                               as.POSIXct("2020-10-03 22:25:00 UTC")
                               "min"),
                           seq(as.POSIXct("2020-11-03 10:10:00 UTC"),
                               as.POSIXct("2020-11-03 10:39:00 UTC"),
                               "min")
                           ))

if we look at row 55 and 56 there is a clear divide in times:

> df[55:56, ]
   index               times
55    55 2020-10-03 22:25:00
56    56 2020-11-03 10:10:00

I would like to add a third categorical column split based on the splits,

e.g. row df$split[55, ] = A and row df$split[56, ] = B

logic like

If time gap between rows is greater than 5 mins start new category for subsequent rows until the next instance where time gap > 5 mins.

thanks


Solution

  • You could use

    library(dplyr)
    
    df %>% 
      mutate(cat = 1 + cumsum(c(0, diff(times)) > 5))
    

    which returns

       index               times cat
    1      1 2020-10-03 21:31:00   1
    2      2 2020-10-03 21:32:00   1
    3      3 2020-10-03 21:33:00   1
    4      4 2020-10-03 21:34:00   1
    5      5 2020-10-03 21:35:00   1
    6      6 2020-10-03 21:36:00   1
    7      7 2020-10-03 21:37:00   1
    8      8 2020-10-03 21:38:00   1
    ...
    53    53 2020-10-03 22:23:00   1
    54    54 2020-10-03 22:24:00   1
    55    55 2020-10-03 22:25:00   1
    56    56 2020-11-03 10:10:00   2
    57    57 2020-11-03 10:11:00   2
    58    58 2020-11-03 10:12:00   2
    59    59 2020-11-03 10:13:00   2
    

    If you need letters or something else, you could for example use

    df %>% 
      mutate(cat = LETTERS[1 + cumsum(c(0, diff(times)) > 5)])
    

    to convert the categories 1 and 2 into A and B.