Search code examples

Why can converting numbers to characters change the numbers?

I imagine this has to do with R's data structures and the answer will be quick, but I haven't yet found one so here goes:

[1] "9875987598759876"

[1] "9875987598759876"

[1] "9875987598759876"

What gives? How should I be making this conversion more safely?


  • .Machine$integer.max indicates that the largest integer R can store is 2147483647 (this could conceivably vary across platforms, but it's very unlikely to). Any number larger than that is automatically converted to floating point, with the attendant imprecision/round-off error. (Unlike in Python, which expensively but magically converts integer variables to an arbitrary-length representation as necessary.)

    If you install the bit64 package you can use 64-bit integers, with (presumably) exactness up to

    [1] 9223372036854775808

    If you start with a character string, you can safely do round-trip conversion to integer64 and back:

    cc <- "9875987598759875"
    x <- as.integer64(cc)
    ## [1] TRUE

    However, typically once you've read a number into R as a regular number it's too late. You can use colClasses="integer64" with read.table()/read.csv()/etc. to read values in as integer64; I believe the file-reading functions from readr and data.table also have integer64-handling capabilities.

    For many applications, if you're not actually planning on doing anything numerical with these digit-strings, it's safest and easiest to make sure you import them as character in the first place ...