Search code examples
rstringdata-structuresnumeric

Why can converting numbers to characters change the numbers?


I imagine this has to do with R's data structures and the answer will be quick, but I haven't yet found one so here goes:

as.character(9875987598759875)
[1] "9875987598759876"

library(crayon)
chr(9875987598759875)
[1] "9875987598759876"

toString(9875987598759875)
[1] "9875987598759876"

What gives? How should I be making this conversion more safely?


Solution

  • .Machine$integer.max indicates that the largest integer R can store is 2147483647 (this could conceivably vary across platforms, but it's very unlikely to). Any number larger than that is automatically converted to floating point, with the attendant imprecision/round-off error. (Unlike in Python, which expensively but magically converts integer variables to an arbitrary-length representation as necessary.)

    If you install the bit64 package you can use 64-bit integers, with (presumably) exactness up to

    print(2^63-1,digits=22)
    [1] 9223372036854775808
    

    If you start with a character string, you can safely do round-trip conversion to integer64 and back:

    library(bit64)
    cc <- "9875987598759875"
    x <- as.integer64(cc)
    identical(cc,as.character(x))
    ## [1] TRUE
    

    However, typically once you've read a number into R as a regular number it's too late. You can use colClasses="integer64" with read.table()/read.csv()/etc. to read values in as integer64; I believe the file-reading functions from readr and data.table also have integer64-handling capabilities.

    For many applications, if you're not actually planning on doing anything numerical with these digit-strings, it's safest and easiest to make sure you import them as character in the first place ...