Search code examples
rdatetimeunix-timestamplubridate

integer64 to datatime conversion in R issue?


Given the following dataframe of integer64 unix epoch:

data_df <- structure(list(time_stamp = structure(c(0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396
), class = "integer64")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L))

I want to convert it to date time (as.POSIXct or anytime()) but I get an error:

    data_df %>%
  dplyr::select(time_stamp) %>% 
  head(10) %>%
  dplyr::mutate(dt = anytime(time_stamp)) %>% dput()

Gives:

structure(list(time_stamp = structure(c(0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396
    ), class = "integer64"), dt = structure(c(0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396, 
    0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396
    ), class = c("POSIXct", "POSIXt"), tzone = "Etc/UTC")), class = c("tbl_df", 
    "tbl", "data.frame"), row.names = c(NA, -10L))

data_df %>%
  dplyr::select(time_stamp) %>% 
  head(10) %>%
  dplyr::mutate(dt = as.POSIXct(time_stamp))

Error in as.POSIXct.default(time_stamp) : do not know how to convert 'time_stamp' to class “POSIXct”

Please advice how to deal with integer64 epoch times.


Solution

  • Pardon the direct language, but your question makes no sense. Taking the first element of your dataset: 0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000282505613660396. That is simply not representable in any of the datatypes you listed. Including integer64. Full stop.

    Now, it so happens that my nanotime package does this at the best available resolution which is nanoseconds represented in 64-integers. And 64 bit integers allow for nanosecond increments since the epoch, at about 19 digits precision. Not the 100+ digits you demanded. No (small memory) variable can.

    As for nanotime, the example() shows some uses, including parsing:

    R> library(nanotime)
    R> example(nanotime)
    
    nanotmR> x <- nanotime("1970-01-01T00:00:00.000000001+00:00")
    
    nanotmR> print(x)
    [1] "1970-01-01T00:00:00.000000001+00:00"
    
    nanotmR> x <- x + 1
    
    nanotmR> print(x)
    [1] "1970-01-01T00:00:00.000000002+00:00"
    
    nanotmR> format(x)
    [1] "1970-01-01T00:00:00.000000002+00:00"
    
    nanotmR> x <- x + 10
    
    nanotmR> print(x)
    [1] "1970-01-01T00:00:00.000000012+00:00"
    
    nanotmR> format(x)
    [1] "1970-01-01T00:00:00.000000012+00:00"
    
    nanotmR> format(nanotime(Sys.time()) + 1:3)  # three elements each 1 ns apart
    [1] "2019-03-10T20:06:53.534292001+00:00" "2019-03-10T20:06:53.534292002+00:00" 
    [3] "2019-03-10T20:06:53.534292003+00:00"
    R> 
    

    Best of all, data.table has support for the integer64 type of the bit64 packages that is used here. Building on the example:

    R> library(data.table)
    data.table 1.12.0  Latest news: r-datatable.com
    R> dt <- data.table(ns = nanotime(Sys.time()) + 1:3)
    R> dt[]
                                        ns
    1: 2019-03-10T20:08:48.165136001+00:00
    2: 2019-03-10T20:08:48.165136002+00:00
    3: 2019-03-10T20:08:48.165136003+00:00
    R> dt[, pt := as.POSIXct(ns)]
    R> dt[]
                                        ns                         pt
    1: 2019-03-10T20:08:48.165136001+00:00 2019-03-10 15:08:48.165136
    2: 2019-03-10T20:08:48.165136002+00:00 2019-03-10 15:08:48.165136
    3: 2019-03-10T20:08:48.165136003+00:00 2019-03-10 15:08:48.165136
    R> 
    

    I use this dual representation of nanosecond granularity with POSIXct representation for R use including plotting all day long. (Note that there is a formatting mishap which shows the nanotime / integer64 column in UTC but the underlying representation is sound and correct as the pt conversion to POSIXct shows. It is currently just after 3pm in my timezone.)