Search code examples
rdata.tablefreadnanotimebit64

fread reads large integers as integer64, which are not upcasted to doubles in case of arithemetic expressions


When a file is read through fread, the columns may be read as integer64 (correctly so), but when these are multiplied with numeric, they are not upcasted to numeric (as in C++ or integers in R). While this is a documented behavior in bit64 package. But it is not intuitive, when numbers are multiplied etc. integer64 behaves differently compared to integer.

Also, integer64 when divided against integer gives a numeric variable. So the behavior is very bizarre !

Should we then always fread using colClasses = numeric for columns to be used in arithmeric expressions with numeric etc ?


    file contents
    x,y
    111,0.3
    2147483648,0.3

    > d <- fread(file)     
    > print(d$x*d$y)
            x       y
1:        111       0.3
2: 2147483648       0.3

> as.integer64(111) * 8e-2
integer64
[1] 9
> as.integer64(111) * 8 / 1e2
8.88

Similarly, quantiles and other R functions will not behave correctly with integer64. This issue creeps into all classes that use integer64 like nanotime


Solution

  • This is the documented behaviour of bit64 package, see Arithmetic precision and coercion in ?bit64:

    The fact that we introduce 64 bit long long integers – without introducing 128-bit long doubles – creates some subtle challenges

    The multiplication operator * coerces its first argument to integer64 but allows its second argument to be also double: the second argument is internaly coerced to 'long double' and the result of the multiplication is returned as integer64

    as.integer64(111) * 8e-2
    integer64
    [1] 9
    

    The division / and power ^ operators also coerce their first argument to integer64 and coerce internally their second argument to 'long double', they return as double

    as.integer64(111) * 8 / 1e2
    8.88
    

    To avoid this, you could set integer64 parameter of fread to "double". To be used with care as there is an open issue.