Search code examples
rfloating-pointtimestamplubridatenanotime

how to safely store millisecond differences between timestamps?


This is some hellish question related to floating-point approximations and timestamps in R. Get ready :) Consider this simple example:

library(tibble)
library(lubridate)
library(dplyr)

tibble(timestamp_chr1 = c('2014-01-02 01:35:50.858'),
       timestamp_chr2 = c('2014-01-02 01:35:50.800')) %>% 
  mutate(time1 = lubridate::ymd_hms(timestamp_chr1),
         time2 = lubridate::ymd_hms(timestamp_chr2),
         timediff = as.numeric(time1 - time2))


# A tibble: 1 x 5
  timestamp_chr1          timestamp_chr2          time1                      time2                       timediff
  <chr>                   <chr>                   <dttm>                     <dttm>                         <dbl>
1 2014-01-02 01:35:50.858 2014-01-02 01:35:50.800 2014-01-02 01:35:50.858000 2014-01-02 01:35:50.799999 0.0580001

Here the time difference between the two timestasmps is obviously 58 milliseconds, yet R stores that with some floating-point approximation so that it appears as 0.058001 seconds.

What is the safest way to get exactly 58 milliseconds as an asnwer instead? I thought about using as.integer (instead of as.numeric) but I am worried about some loss of information. What can be done here?

Thanks!


Solution

  • Some considerations, some I think you already know:

    • floating-point will rarely give you perfectly 58 milliseconds (due to R FAQ 7.31 and IEEE-754);

    • display of the data can be managed on the console with options(digits.secs=3) (and digits=3) and in reports with sprintf, format, or round;

    • calculation "goodness" can be improved if you round before calculation; while this is a little more onerous, as long as we can safely assume that the data is accurate to at least milliseconds, this holds mathematically.

    If you're concerned about introducing errors in the data, though, an alternative is to encode as milliseconds (instead of the R norm of seconds). If you can choose an arbitrary and recent (under 24 days) reference point, then you can do it with normal integer, but if that is insufficient or you prefer to use epoch milliseconds, then you need to jump to 64-bit integers, perhaps with bit64.

    now <- Sys.time()
    as.integer(now)
    # [1] 1583507603
    as.integer(as.numeric(now) * 1000)
    # Warning: NAs introduced by coercion to integer range
    # [1] NA
    bit64::as.integer64(as.numeric(now) * 1000)
    # integer64
    # [1] 1583507603439