I have a column with timestamps formatted like this printing one row it looks like this for example:
"2022-09-21 02:02:03 UTC"
There are many rows that have the same timestamp, but I want another column with the first timestamp after a specific times based on a vector that looks like this:
start_1 <- strptime("07:15:00, format= "%H:%M:%S")
start_2 <- strptime("09:15:00, format= "%H:%M:%S")
Based on my start_1 timestamp I want to check in my df below which is the first timestamp after start_1 and mark that one in another column called start_datetime.
I only want to mark the first timestamp after the start in my dataset, this is crucial for what I am trying to do, since many of the rows have exactly the same timestamp.
Datetime | start_datetime |
---|---|
2022-09-21 07:02:03 | NA |
2022-09-21 07:02:03 | NA |
2022-09-21 07:14:03 | NA |
2022-09-21 07:16:03 | 07:16 |
2022-09-21 07:16:03 | NA |
2022-09-21 09:19:03 | 09:19 |
I've explored multiple options, but so far I've been struggling to find a solution.
I'm assuming that this is a "per day" thing, meaning that your start_1
and start_2
just reflect time-of-day (which means strptime
is not doing what you want).
If your Datetime
column is class character
, then we can make some assumptions about lexicographic sorting and do something like this:
start_1 <- "07:15:00"; start_2 <- "09:15:00"
library(dplyr)
# library(purrr) # map_dfc
quux %>%
mutate(
Date = substring(Datetime, 1, 10),
purrr::map_dfc(list(s1 = start_1, s2 = start_2),
~ if_else(substring(Datetime, nchar(Datetime) - 7, nchar(Datetime)) > .,
., .[NA]))
) %>%
group_by(Date) %>%
mutate(
across(c(s1, s2), ~ replace(., duplicated(.), NA)),
start_datetime = coalesce(s1, s2)
) %>%
ungroup() %>%
select(-Date, -s1, -s2)
# # A tibble: 6 x 2
# Datetime start_datetime
# <chr> <chr>
# 1 2022-09-21 07:02:03 NA
# 2 2022-09-21 07:02:03 NA
# 3 2022-09-21 07:14:03 NA
# 4 2022-09-21 07:16:03 07:15:00
# 5 2022-09-21 07:16:03 NA
# 6 2022-09-21 09:19:03 09:15:00
In fact, we can generalize this to any number of starts
.
fun <- function(time, starts) {
tmp <- which(
apply(outer(time, starts, `>`), 2, function(z) replace(z, c(FALSE, z[-length(z)]), FALSE)),
arr.ind = TRUE)
replace(time[NA], tmp[,"row"], starts[tmp[,"col"]])
}
starts <- c("07:15:00", "09:15:00")
quux %>%
mutate(
Date = substring(Datetime, 1, 10),
Time = substring(Datetime, 12, 19)
) %>%
group_by(Date) %>%
mutate(start_datetime = fun(Time, starts)) %>%
ungroup() %>%
select(-Date, -Time)
# # A tibble: 6 x 2
# Datetime start_datetime
# <chr> <chr>
# 1 2022-09-21 07:02:03 NA
# 2 2022-09-21 07:02:03 NA
# 3 2022-09-21 07:14:03 NA
# 4 2022-09-21 07:16:03 07:15:00
# 5 2022-09-21 07:16:03 NA
# 6 2022-09-21 09:19:03 09:15:00