Search code examples
rdatabasesqliter-dbi

How to retrieve large numbers from database with RSQLite? (Buffer overflow?)


I have a database with various token and frequency counts. When I try to retrieve these from within R using RSQLite I get low negative numbers, so presumably there is a buffer overflow issue. If I open the database in say sqliteman and manually query it the data are returned correctly, so it must be to do with DBI or RSQLite.

Is there a way to retrieve very large integers/floating point values via RSQLite without this issue?

Here is a minimal example:

library("DBI"); library("RSQLite")
db.h <- dbConnect(RSQLite::SQLite(), dbname=":memory:")
dbGetQuery(db.h, "CREATE TABLE test (a TEXT, b INT)")
dbGetQuery(db.h, "INSERT INTO test VALUES ('value1',   282817178)")
dbGetQuery(db.h, "INSERT INTO test VALUES ('value2', 15620693910)")
result = dbGetQuery(db.h, "SELECT * FROM test")
result
dbDisconnect(db.h)

The output I get is as follows:

>        a           b
> 1 value1   282817178
> 2 value2 -1559175274

I'm using RStudio 0.98.501 (R version 3.0.2) on 32-bit Windows 7, if that's relevant at all.

Edit: Note that SQLite 3 uses 64 bit integers for all values in memory and request processing. Using BIGINT in place of INT as suggested in the comments below produces the same overflow issue.


Solution

  • This post on the RSQLite issue tracker reveals that this appears to be a known issue/limitation of the implementation. It says this is (at least in part) to do with R not managing well with 64bit integers as used by SQLite.

    Here is a workaround that seems to work for me: Cast the column with large integers as TEXT in the SQL query, then manually readjust it as numeric value in R.

    Working example:

    library("DBI"); library("RSQLite")
    db.h <- dbConnect(RSQLite::SQLite(), dbname=":memory:")
    dbGetQuery(db.h, "CREATE TABLE test (a TEXT, b INT)")
    dbGetQuery(db.h, "INSERT INTO test VALUES ('value1',   282817178)")
    dbGetQuery(db.h, "INSERT INTO test VALUES ('value2', 15620693910)")
    result = dbGetQuery(db.h, "SELECT a, CAST(b as TEXT) as b FROM test")
    result$b <- as.numeric(result$b); result
    dbDisconnect(db.h)
    

    Output:

           a           b
    1 value1   282817178
    2 value2 15620693910