Search code examples
rlubridateposixct

check occurence of a timestamp in a column of timestamps


I have a column with timestamps formatted like this printing one row it looks like this for example:

"2022-09-21 02:02:03 UTC"

There are many rows that have the same timestamp, but I want another column with the first timestamp after a specific times based on a vector that looks like this:

start_1 <- strptime("07:15:00, format= "%H:%M:%S") 
start_2 <- strptime("09:15:00, format= "%H:%M:%S") 

Based on my start_1 timestamp I want to check in my df below which is the first timestamp after start_1 and mark that one in another column called start_datetime.

I only want to mark the first timestamp after the start in my dataset, this is crucial for what I am trying to do, since many of the rows have exactly the same timestamp.

Datetime start_datetime
2022-09-21 07:02:03 NA
2022-09-21 07:02:03 NA
2022-09-21 07:14:03 NA
2022-09-21 07:16:03 07:16
2022-09-21 07:16:03 NA
2022-09-21 09:19:03 09:19

I've explored multiple options, but so far I've been struggling to find a solution.


Solution

  • I'm assuming that this is a "per day" thing, meaning that your start_1 and start_2 just reflect time-of-day (which means strptime is not doing what you want).

    If your Datetime column is class character, then we can make some assumptions about lexicographic sorting and do something like this:

    start_1 <- "07:15:00"; start_2 <- "09:15:00"
    library(dplyr)
    # library(purrr) # map_dfc
    quux %>%
      mutate(
        Date = substring(Datetime, 1, 10),
        purrr::map_dfc(list(s1 = start_1, s2 = start_2),
                       ~ if_else(substring(Datetime, nchar(Datetime) - 7, nchar(Datetime)) > .,
                                 ., .[NA]))
      ) %>%
      group_by(Date) %>%
      mutate(
        across(c(s1, s2), ~ replace(., duplicated(.), NA)), 
        start_datetime = coalesce(s1, s2)
      ) %>%
      ungroup() %>%
      select(-Date, -s1, -s2)
    # # A tibble: 6 x 2
    #   Datetime            start_datetime
    #   <chr>               <chr>         
    # 1 2022-09-21 07:02:03 NA            
    # 2 2022-09-21 07:02:03 NA            
    # 3 2022-09-21 07:14:03 NA            
    # 4 2022-09-21 07:16:03 07:15:00      
    # 5 2022-09-21 07:16:03 NA            
    # 6 2022-09-21 09:19:03 09:15:00      
    

    In fact, we can generalize this to any number of starts.

    fun <- function(time, starts) {
      tmp <- which(
        apply(outer(time, starts, `>`), 2, function(z) replace(z, c(FALSE, z[-length(z)]), FALSE)),
        arr.ind = TRUE)
      replace(time[NA], tmp[,"row"], starts[tmp[,"col"]])
    }
    starts <- c("07:15:00", "09:15:00")
    
    quux %>%
      mutate(
        Date = substring(Datetime, 1, 10),
        Time = substring(Datetime, 12, 19)
      ) %>%
      group_by(Date) %>%
      mutate(start_datetime = fun(Time, starts)) %>%
      ungroup() %>%
      select(-Date, -Time)
    # # A tibble: 6 x 2
    #   Datetime            start_datetime
    #   <chr>               <chr>         
    # 1 2022-09-21 07:02:03 NA            
    # 2 2022-09-21 07:02:03 NA            
    # 3 2022-09-21 07:14:03 NA            
    # 4 2022-09-21 07:16:03 07:15:00      
    # 5 2022-09-21 07:16:03 NA            
    # 6 2022-09-21 09:19:03 09:15:00