Search code examples
rdata.tabledoublebigintint64

Dealing with large integers in R


I have an integer 18495608239531729, it is causing me some trouble with data.table package in a sense, that when i read data from csv file which stores numbers this big, it stores them as integer64

Now i would like to filter my data.table like dt[big_integers == 18495608239531729] which gives me a data type mismatch (comparing integer64 and double).

I figured that since 18495608239531729 is really big number, i should perhaps use the bit64 package to handle the data types.

So i did:

library(bit64)
as.integer64(18495608239531729)

> integer64
> [1] 18495608239531728

I thought integer64 should be able to work with much larger values without any issues?

So i did:

as.integer64(18495608239531729) == 18495608239531729

> [1] TRUE

At which point i was happier, but then i figured, why not try:

as.integer64(18495608239531728)

> integer64
> [1] 18495608239531728

Which lead me to trying also:

as.integer64(18495608239531728) == as.integer64(18495608239531729)

> [1] TRUE

What is the right way to handle big numbers in R without the loss of precision? Technically, in my case, the i do not do any mathematical operations with the said column, so i could treat it as character vectors (although i was worried that would take up more memory, and joins in r data.table would be slower?)


Solution

  • You are passing a floating point number to as.integer64. The loss of precision is already in your input to as.integer64:

    is.double(18495608239531729)
    #[1] TRUE
    
    sprintf("%20.5f", 18495608239531729)
    #[1] "18495608239531728.00000"
    

    Pass a character string to avoid that:

    library(bit64)
    as.integer64("18495608239531729")
    #integer64
    #[1] 18495608239531729