Search code examples
rfunctiondatabase-connectionscopingrmysql

How to connect to a database outside a function, disconnect inside a function, and then reconnect inside that function with R & RMySQL


I have an R script utilizing the RMySQL library that connects to an Amazon RDS. The script initially connects to the RDS prior to running the function. Inside the function, the MySQL on the RDS is queried, and then the Sys.sleep command is executed for a duration of 24 hours (86400 seconds) and then resumes. Because Amazon's RDS wait_timeout parameter is set to 8 hours (28800) which results in this error in my R console:

Error in .local(conn, statement, ...) :
    could not run statement: MySQL server has gone away

I cannot do this so I have tried the following code:

duration <- 86400
con <- dbConnect(MySQL(),
       user = passwordFile$user[1],
       password = passwordFile$password[1],
       host = passwordFile$host[1],
       dbname = passwordFile$dbname[1])
query <- function(...) dbGetQuery(con, ...)
testing <- function(){
       test1 <- query("SELECT * FROM a_log")
       print(test1)
       dbDisconnect(con)
       Sys.sleep(duration)
       con <- dbConnect(MySQL(),
              user = passwordFile$user[1],
              password = passwordFile$password[1],
              host = passwordFile$host[1],
              dbname = passwordFile$dbname[1])
       test2 <- query("SELECT * FROM a_log")
       return(test2)
}
temp <- testing()
print(temp)

The test1 object prints fine. However, I get the following error (even when I set duration to 1) when trying to print test2:

>Error in mysqlQuickSQL(conn, statement, ...) : expired MySQLConnection

Is it possible to connect to a MySQL server.database outside of a function and then to disconnect within a function and then reconnect within the function? Thank you!!


Solution

  • Without being able to actually try out this code, a possible problem is that you have two functions trying to manage one connection. It can help to separate tasks with functions: connect_to_amazon creates the connection, query returns a table, and testing wraps a process.

    It's also risky to have functions reference values in a parent environment, since those can change or not even exist later.

    You can have query, or any other function that accesses the Amazon database, use connect_to_amazon to make the connection, then on.exit to make sure the it's closed when the function's done.

    connect_to_amazon <- function(password_file) {
      dbConnect(
        MySQL(),
        user     = password_file$user[1],
        password = password_file$password[1],
        host     = password_file$host[1],
        dbname   = password_file$dbname[1]
      )
    }
    
    query <- function(..., password_file) {
      amazon_con <- connect_to_amazon(password_file)
      on.exit(dbDisconnect(amazon_con), add = TRUE)
      dbGetQuery(conn = amazon_con, ...)
    }
    
    testing <- function(password_file) {
      test1 <- query("SELECT * FROM a_log", password_file = password_file)
      print(test)
      Sys.sleep(duration)
      test2 <- query("SELECT * FROM a_log", password_file = password_file)
      return(test2)
    }
    

    If you needed to do any other tasks with the Amazon database, you can either write another wrapper function (e.g., write_amazon_table(..., password_file)), or just call connect_to_amazon and work directly with the connection.