Search code examples
rposixct

fast way to compare time objects in R


let's assume the following data.frame

set.seed(20221117)
df <- data.frame(x = as.POSIXct(sample(2e9, 1e5), origin = "1970-01-01 00:00.00 UTC"),
                 y = as.POSIXct(sample(2e9, 1e5), origin = "1970-01-01 00:00.00 UTC"))

What would be a reasonably fast way to select the maximum for each row (ideally without having to explicitely convert into double)?


Solution

  • do.call(pmax, df)
    
    [1] "2020-11-30 22:09:29 GMT" "2026-06-14 20:00:05 GMT"
    [3] "2008-02-08 01:32:23 GMT" "2021-06-17 10:44:05 GMT"
    [5] "2025-02-18 23:20:28 GMT" "1997-03-27 18:10:44 GMT"
    ...
    

    Benchmarking

    bench::mark(
      Sindr = do.call(pmax, df),
      Tom   = df %>%  
        rowwise() %>% 
        mutate(max = max(c(x, y))) %>%
        pull(max)
    )
    
      expression      min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
      <bch:expr> <bch:tm> <bch:t>     <dbl> <bch:byt>    <dbl> <int> <dbl>
    1 Sindr        2.29ms  4.14ms   176.       6.49MB    49.9     88    25
    2 Tom           6.59s   6.59s     0.152   24.09MB     7.28     1    48