Search code examples
mysqlrmysql-workbenchrmysqlknime

Why does R upload data much faster than KNIME or Workbench?


What I want to know is, what the heck happens, under the hoods, when I upload data through R and it turns to be way much faster than MySQL Workbench or KNIME?

I work with data and, everyday, I upload data into a MySQL server. I used to upload data using KNIME since it was much faster than uploading with MySQL Workbench (select the table -> "import data").

Some infos: The CSV has 4000 rows and 15 columns. The library I used in R is RMySQL. The node I used in KNIME is database writer.

library('RMySQL')

df=read.csv('C:/Users/my_user/Documents/file.csv', encoding = 'UTF-8', sep=';')

connection <- dbConnect(
    RMySQL::MySQL(),
    dbname = "db_name",
    host = "yyy.xxxxxxx.com",
    user = "vitor",
    password = "****"
)

dbWriteTable(connection, "table_name", df, append=TRUE, row.names=FALSE)

So, to test, I did the exact same process, using the same file. It took 2 minutes in KNIME and only seconds in R.


Solution

  • Everything happens under the hood! Data upload to DB depends on parameters such as interface between DB and tool, network connectivity, batch size set, memory available for tool and tool data processing speed itself and probably some more. In your case RMySQL package uses batch size of 500 by default and KNIME only 1 so probably that is where the difference comes from. Try setting it to 500 in KNIME and then compare. Have no clue how MySQL Workbench works...