I have a database with various token and frequency counts. When I try to retrieve these from within R using RSQLite I get low negative numbers, so presumably there is a buffer overflow issue. If I open the database in say sqliteman and manually query it the data are returned correctly, so it must be to do with DBI or RSQLite.
Is there a way to retrieve very large integers/floating point values via RSQLite without this issue?
Here is a minimal example:
library("DBI"); library("RSQLite")
db.h <- dbConnect(RSQLite::SQLite(), dbname=":memory:")
dbGetQuery(db.h, "CREATE TABLE test (a TEXT, b INT)")
dbGetQuery(db.h, "INSERT INTO test VALUES ('value1', 282817178)")
dbGetQuery(db.h, "INSERT INTO test VALUES ('value2', 15620693910)")
result = dbGetQuery(db.h, "SELECT * FROM test")
result
dbDisconnect(db.h)
The output I get is as follows:
> a b
> 1 value1 282817178
> 2 value2 -1559175274
I'm using RStudio 0.98.501 (R version 3.0.2) on 32-bit Windows 7, if that's relevant at all.
Edit: Note that SQLite 3 uses 64 bit integers for all values in memory and request processing. Using BIGINT in place of INT as suggested in the comments below produces the same overflow issue.
This post on the RSQLite issue tracker reveals that this appears to be a known issue/limitation of the implementation. It says this is (at least in part) to do with R not managing well with 64bit integers as used by SQLite.
Here is a workaround that seems to work for me: Cast the column with large integers as TEXT in the SQL query, then manually readjust it as numeric value in R.
Working example:
library("DBI"); library("RSQLite")
db.h <- dbConnect(RSQLite::SQLite(), dbname=":memory:")
dbGetQuery(db.h, "CREATE TABLE test (a TEXT, b INT)")
dbGetQuery(db.h, "INSERT INTO test VALUES ('value1', 282817178)")
dbGetQuery(db.h, "INSERT INTO test VALUES ('value2', 15620693910)")
result = dbGetQuery(db.h, "SELECT a, CAST(b as TEXT) as b FROM test")
result$b <- as.numeric(result$b); result
dbDisconnect(db.h)
Output:
a b
1 value1 282817178
2 value2 15620693910