Search code examples
rdatetime-serieslubridate

Calculating the date from a series of consecutive time data (R)


I have a column of consecutive times (hours, minutes, seconds) with unequal intervals that extends over multiple days.

For example:

library(lubridate)    
df <- data.frame(Data  = c(1:10),
                     Time = hms(c("10:00:00","15:38:44","22:12:37",
                                  "23:59:00","00:07:28","04:56:00",
                                  "08:01:25","12:10:54","16:08:43",
                                  "20:44:44")))

I want to create a new column with the combined date and time, assuming the first data point was taken on 01.01.2020 at 10:00:00. So, for instance, data point 8 would get "02.01.2020 12:10:54".

My attempts to solve this have not been worth posting here, does anyone have any suggestions?

Thanks so much!


Solution

  • It's a little bit tricky but here's a solution:

    require(tidyverse)    
    require(lubridate)
    
    dataf <- tibble(Data  = c(1:16),
                     Time = hms(c("10:00:00","15:38:44","22:12:37",
                                  "23:59:00","00:07:28","04:56:00",
                                  "08:01:25","12:10:54","16:08:43",
                                  "20:44:44","00:07:28","04:56:00",
                                  "08:01:25","12:10:54","16:08:43",
                                  "20:44:44")))
    
    dataf_2 <- dataf %>% 
      mutate(Data = Data + 1,
             Time_n = as.numeric(Time)) %>% 
      select(Data, Time_n)
    
    dataf %>%
      left_join(dataf_2, by = c("Data")) %>%
      replace_na(list(Time_n = 0)) %>% 
      mutate(starting_date = dmy("01/01/2020")) %>% 
      mutate(change_day = if_else(as.numeric(Time) >= Time_n, 0, 1)) %>% 
      arrange(Data) %>% 
      mutate(cumsum=cumsum(change_day),
             t = paste(hour(Time), minute(Time), second(Time))) %>% 
      mutate(final_date = ymd_hms(paste(starting_date + days(cumsum), t))) %>% 
      select(Data, Time, final_date)
    

    It will produce an output like this:

    # A tibble: 16 x 3
        Data Time        final_date         
       <dbl> <Period>    <dttm>             
     1     1 10H 0M 0S   2020-01-01 10:00:00
     2     2 15H 38M 44S 2020-01-01 15:38:44
     3     3 22H 12M 37S 2020-01-01 22:12:37
     4     4 23H 59M 0S  2020-01-01 23:59:00
     5     5 7M 28S      2020-01-02 00:07:28
     6     6 4H 56M 0S   2020-01-02 04:56:00
     7     7 8H 1M 25S   2020-01-02 08:01:25
     8     8 12H 10M 54S 2020-01-02 12:10:54
     9     9 16H 8M 43S  2020-01-02 16:08:43
    10    10 20H 44M 44S 2020-01-02 20:44:44