Search code examples
rprecisiondigitsread.csv

Weird error in R when importing (64-bit) integer with many digits


I am importing a csv that has a single column which contains very long integers (for example: 2121020101132507598)

a<-read.csv('temp.csv',as.is=T)

When I import these integers as strings they come through correctly, but when imported as integers the last few digits are changed. I have no idea what is going on...

1 "4031320121153001444" 4031320121153001472
2 "4113020071082679601" 4113020071082679808
3 "4073020091116779570" 4073020091116779520
4 "2081720101128577687" 2081720101128577792
5 "4041720081087539887" 4041720081087539712
6 "4011120071074301496" 4011120071074301440
7 "4021520051054304372" 4021520051054304256
8 "4082520061068996911" 4082520061068997120
9 "4082620101129165548" 4082620101129165312


Solution

  • As others have noted, you can't represent integers that large. But R isn't reading those values into integers, it's reading them into double precision numerics.

    Double precision can only represent numbers to ~16 places accurately, which is why you see your numbers rounded after 16 places. See the gmp, Rmpfr, and int64 packages for potential solutions. Though I don't see a function to read from a file in any of them, maybe you could cook something up by looking at their sources.

    UPDATE: Here's how you can get your file into an int64 object:

    # This assumes your numbers are the only column in the file
    # Read them in however, just ensure they're read in as character
    a <- scan("temp.csv", what="")
    ia <- as.int64(a)