Search code examples
rrmysql

load all the MYSQL tables into data frames with same names


I have successfully loaded all MySQL tables into as many dataframe variables, all of same names as table names (this was a personal learning) using the following code. My question is: Is there a better method, as I get a feeling this is quite slow.

db<-dbConnect(MySQL(),user='****',password="****",dbname="****",host='XX.XXX.XXX.XX') 

  tables<-dbListTables(db)

  #load all tables in variables with same name as the MySQL table name. 
  #warning : this loop will take more than 90 seconds to complete and will download all the MySQL tables.
  for (i in 1:NROW(tables)){
    assign(tables[i],dbReadTable(db,tables[i]))
  }

#release the DB
  dbDisconnect(db)

Is this the fastest method on the planet?


Solution

  • As shown, consider saving all MySQL tables into one list of many dataframes which avoids having to manage many objects in your global environment. Below even names the list elements to the corresponding table name which can be referenced with $ or [[..]] indexing:

    # DATA FRAME LIST OF NAMED ELEMENTS
    dfList <- setNames(lapply(tables, function(t) dbReadTable(db, t)), tables)
    
    dfList$table1Name    
    dfList[["table2Name"]]    
    ...
    

    Hence, you can access each dataframe by index and run any dataframe operations as if they were independent objects:

    aggregate(.~Group1 + Group2, dfList$table1Name, FUN=sum)
    
    merge(dfList$table1Name, dfList$table2Name, by="joinkey")
    
    by(dfList$table1Name, dfList$table1Name[c("factor1", "factor2")], FUN=summary)
    

    Now, if you really, really do want multiple variables, use list2env to output list elements to separate objects:

    list2env(dfList, envir=.GlobalEnv)