I'm trying to get some data from an Impala database using the sqlQuery function from the RODBC package. The results I get changes from one execution of a query to another execution of the exact same query.
The data.frame I get doesn't always have the same number of rows:
library("RODBC")
conn <- odbcConnect("Cloudera Impala DSN;host=mydb;port=21050")
df<-sqlQuery(conn, "select * from hydrau.hydr where flight= 'V0051'")
dim(df)
[1] 26600 220
df<-sqlQuery(conn, "select * from hydrau.hydr where flight= 'V0051'")
dim(df)
[1] 142561 220
df<-sqlQuery(conn, "select * from hydrau.hydr where flight= 'V0051'")
dim(df)
[1] 23500 220
This query should in fact return a 142561 x 220 data frame.
On the other hand, the following query always return the same (correct) result :
sqlQuery(conn, "select count(*) from hydr where flight= 'V0051' ")
count(*)
1 142561
It seems my problem was that Impala didn't have enough memory to perform well.