I'm pulling 1.6M rows across 54 columns from Snowflake to R desktop using odbc and DBI.
According to my Snowflake dashboard the query took 4.9 sec to run from the database end but when I wrapped my query with tic() and toc() it took about 50 mins to load into my R data frame.
snowflake <- DBI::dbConnect(odbc::odbc(), "Snowflake")
tic()
x <- DBI::dbGetQuery(conn = snowflake, statement = "SELECT * FROM DB.SCHEMA.TABLE")
toc()
I've downgraded my R version from 4.2.2 to 4.1.2 and have updated my odbc package to 1.3.4 and I still have issues pulling down the data. I'm running a 64 bit windows machine and have the latest 64 bit Snowflake driver (SnowflakeDSIIDriver).
I've worked with my IT department and checked my network latency and speeds and all is in good order.
Is there some other issue I may be overlooking that could be causing such delays getting the data from the Snowflake warehouse to my R dataframe?
There are two things involved in this flow:
first, the query is executed in Snowflake and you see its running time on Query History like in your screenshot.
secondly the result set of a large query is stored temporarily on the internal stage of your Snowflake account and the driver (ODBC) is instructed to download it from there. From a driver point of view the query finishes when the full result set is download. Based on the size of it, this may take a while.
Also, as a note make sure ODBC driver has no logging enabled at Trace level as this would slow it down significantly.