Search code examples
rimpalarodbc

Why is sqlQuery from RODBC not always returning the same data when querying an Impala DB?


I'm trying to get some data from an Impala database using the sqlQuery function from the RODBC package. The results I get changes from one execution of a query to another execution of the exact same query.

The data.frame I get doesn't always have the same number of rows:

library("RODBC")
conn <- odbcConnect("Cloudera Impala DSN;host=mydb;port=21050")    
df<-sqlQuery(conn, "select * from hydrau.hydr where flight= 'V0051'")
dim(df)
[1] 26600   220
df<-sqlQuery(conn, "select * from hydrau.hydr where flight= 'V0051'")
dim(df)
[1] 142561   220
df<-sqlQuery(conn, "select * from hydrau.hydr where flight= 'V0051'")
dim(df)
[1] 23500   220

This query should in fact return a 142561 x 220 data frame.

On the other hand, the following query always return the same (correct) result :

sqlQuery(conn, "select count(*) from hydr where flight= 'V0051' ")
  count(*)
1   142561

Solution

  • It seems my problem was that Impala didn't have enough memory to perform well.