Search code examples
rduckdbr-dbi

reusing duckdb connection in windows with `DBI::dbSendQuery()`


Calling the following function twice will throw an error in Windows, but not Ubuntu. I tried on two machines each. It is related to DBI::dbSendQuery() -if that's commented out, the function can execute repeatedly without error. All four machines have duckdb v1.0.0 and DBI v1.2.3. This happens (with similar error messages) if the arguments are inside or outside the call to duckdb::duckdb().

The only solution I've found it to keep the connection opened the whole execution. It requires restarting R (ie, ctl+shift+F10 in RStudio) to be able to hit the same DuckDB file again.

What can I do differently?

path_db <- "~/so-question.duckdb"
f <- function() {
  sql_1 <-
    "
      DROP TABLE if exists t1;
      CREATE TABLE t1 (
        id   int  primary key,
        s    text not null,
      );
    "
  sql_2 <- "SHOW ALL TABLES;"
  cn <- DBI::dbConnect(duckdb::duckdb(), dbdir = path_db, bigint = "integer64")
  # Alternative connection
  # cn <- DBI::dbConnect(duckdb::duckdb(dbdir = path_db, bigint = "integer64"))
  
  ds_1 <- DBI::dbSendQuery(cn, sql_1)
  ds_2 <- DBI::dbGetQuery(cn, sql_2)
  DBI::dbDisconnect(cn, shutdown = TRUE)
  rm(cn)
  
  ds_2
}

f()
#      database schema name column_names     column_types temporary
# 1 so-question   main   t1        id, s INTEGER, VARCHAR     FALSE

f() # Throws an error in Windows, but not Ubutu.

output:

Error: rapi_startup: Failed to open database: {"exception_type":"IO","exception_message":"File is already open in \nC:\\Program Files\\RStudio\\resources\\app\\bin\\rsession-utf8.exe (PID 14948)"}

output from alternate connection:

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'drv' in selecting a method for function 'dbConnect': rapi_startup: Failed to open database: {"exception_type":"IO","exception_message":"File is already open in \nC:\\Program Files\\RStudio\\resources\\app\\bin\\rsession-utf8.exe (PID 12140)"}

Solution

  • Recent versions of the duckdb R package disconnect properly, but garbage collection must be invoked.

    path_db <- "~/so-question.duckdb"
    f <- function() {
      sql_1 <-
        "
          DROP TABLE if exists t1;
          CREATE TABLE t1 (
            id   int  primary key,
            s    text not null,
          );
        "
      sql_2 <- "SHOW ALL TABLES;"
      cn <- DBI::dbConnect(duckdb::duckdb(), dbdir = path_db, bigint = "integer64")
      # Alternative connection
      # cn <- DBI::dbConnect(duckdb::duckdb(dbdir = path_db, bigint = "integer64"))
      
      # Use dbExecute() for side effects
      DBI::dbExecute(cn, sql_1)
      ds_2 <- DBI::dbGetQuery(cn, sql_2)
      
      # No longer need shutdown = TRUE
      DBI::dbDisconnect(cn)
      rm(cn)
      
      # This is needed
      gc()
      
      ds_2
    }