Search code examples
rdatetimesumsubset

Sum time across different continuous time events across date and time combinations in R


I am having trouble figuring out how to account for and sum continuous time observations across multiple dates and time events in my dataset. A similar question is found here, but it only accounts for one instance of a continuous time event. I have a dataset with multiple date and time combinations. Here is an example from that dataset, which I am manipulating in R:

date.1 <- c("2021-07-21", "2021-07-21", "2021-07-21", "2021-07-29", "2021-07-29", "2021-07-30", "2021-08-01","2021-08-01","2021-08-01")
time.1 <- c("15:57:59", "15:58:00", "15:58:01", "15:46:10", "15:46:13", "18:12:10", "18:12:10","18:12:11","18:12:13")
df <- data.frame(date.1, time.1)
df
       date.1   time.1
1 2021-07-21 15:57:59
2 2021-07-21 15:58:00
3 2021-07-21 15:58:01
4 2021-07-29 15:46:10
5 2021-07-29 15:46:13
6 2021-07-30 18:12:10
7 2021-08-01 18:12:10
8 2021-08-01 18:12:11
9 2021-08-01 18:12:13

I tried following the following script from the link I present:

df$missingflag <-  c(1, diff(as.POSIXct(df$time.1, format="%H:%M:%S", tz="UTC"))) > 1
df
   date.1   time.1 missingflag
1 2021-07-21 15:57:59       FALSE
2 2021-07-21 15:58:00        TRUE
3 2021-07-21 15:58:01       FALSE
4 2021-07-29 15:46:10       FALSE
5 2021-07-29 15:46:13        TRUE
6 2021-07-30 18:12:10        TRUE
7 2021-08-01 18:12:10       FALSE
8 2021-08-01 18:12:11       FALSE
9 2021-08-01 18:12:13        TRUE

But it did not working as anticipated and did not get closer to my answer. It would have been an intermediate goal and probably wouldn't answer my questions.

The GOAL of my dilemma would be account to for all the continuous time observations and put into a new table like this:

   date.1   time.1      secs
1 2021-07-21 15:57:59       3
4 2021-07-29 15:46:10       1
5 2021-07-29 15:46:13       1
6 2021-07-30 18:12:10       1
7 2021-08-01 18:12:10       2
9 2021-08-01 18:12:13       1

You will see that the start time of each of the continuous time observations are recorded and the total number of seconds (secs) observed since the start of the continuous observation are being recorded. The script would account for date.1 as there are multiple dates in the dataset.

Thank you in advance.


Solution

  • You can create a datetime object combining date and time columns, get the difference of consecutive values and create groups where all the time 1s apart are part of the same group. For each group count the number of rows and their first datetime value.

    library(dplyr)
    library(tidyr)
    
    df %>%
      unite(datetime, date.1, time.1, sep = ' ') %>%
      mutate(datetime = lubridate::ymd_hms(datetime)) %>%
      group_by(grp = cumsum(difftime(datetime, 
               lag(datetime, default = first(datetime)), units = 'secs') > 1)) %>%
      summarise(datetime = first(datetime), 
                secs = n(), .groups = 'drop') %>%
      select(-grp)
    
    #  datetime             secs
    #  <dttm>              <int>
    #1 2021-07-21 15:57:59     3
    #2 2021-07-29 15:46:10     1
    #3 2021-07-29 15:46:13     1
    #4 2021-07-30 18:12:10     1
    #5 2021-08-01 18:12:10     2
    #6 2021-08-01 18:12:13     1
    

    I have kept datetime as single combined column here but if needed you can separate them again as two different columns using

     %>% separate(datetime, c('date', 'time'), sep = ' ')