let's assume the following data.frame
set.seed(20221117)
df <- data.frame(x = as.POSIXct(sample(2e9, 1e5), origin = "1970-01-01 00:00.00 UTC"),
y = as.POSIXct(sample(2e9, 1e5), origin = "1970-01-01 00:00.00 UTC"))
What would be a reasonably fast way to select the maximum for each row (ideally without having to explicitely convert into double
)?
do.call(pmax, df)
[1] "2020-11-30 22:09:29 GMT" "2026-06-14 20:00:05 GMT"
[3] "2008-02-08 01:32:23 GMT" "2021-06-17 10:44:05 GMT"
[5] "2025-02-18 23:20:28 GMT" "1997-03-27 18:10:44 GMT"
...
Benchmarking
bench::mark(
Sindr = do.call(pmax, df),
Tom = df %>%
rowwise() %>%
mutate(max = max(c(x, y))) %>%
pull(max)
)
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc
<bch:expr> <bch:tm> <bch:t> <dbl> <bch:byt> <dbl> <int> <dbl>
1 Sindr 2.29ms 4.14ms 176. 6.49MB 49.9 88 25
2 Tom 6.59s 6.59s 0.152 24.09MB 7.28 1 48