Search code examples
rinsertbulkinsertrjdbc

R: Insert csv-file into database using RJDBC


As RJDBC is the only package I have been able to make work on Ubuntu, I am trying to use it to INSERT a CSV-file into a database.

I can make the following work:

# Connecting to database
library(RJDBC)
drv <- JDBC('com.microsoft.sqlserver.jdbc.SQLServerDriver', 'drivers/sqljdbc42.jar', identifier.quote="'")
connection_string <- "jdbc:sqlserver://blablaserver;databaseName=testdatabase"
ch <- dbConnect(drv, connection_string, "username", "password")

# Inserting a row
dbSendQuery(ch, "INSERT INTO cpr_esben.CPR000_Startrecord (SORTFELT_10,OPGAVENR,PRODDTO,PRODDTOFORRIG,opretdato) VALUES ('TEST', 123, '2012-01-01', '2012-01-01', '2012-01-01')")

The insert works. Next I try to make an INSERT of a CSV-file with the same data, that is separated by the default "tab" and I am working on windows.

# Creating csv
df <- data.frame(matrix(c('TEST', 123, '2012-01-01', '2012-01-01', '2012-01-01'), nrow = 1), stringsAsFactors = F)
colnames(df) <- c("SORTFELT_10","OPGAVENR","PRODDTO","PRODDTOFORRIG","opretdato")
class(df$SORTFELT_10) <- "character"
class(df$OPGAVENR) <- "character"
class(df$PRODDTO) <- "character"
class(df$PRODDTOFORRIG) <- "character"
class(df$opretdato) <- "character"
write.table(df, file = "test.csv", col.names = FALSE, quote = FALSE)

# Inserting CSV to database
dbSendQuery(ch, "INSERT cpr_esben.CPR000_Startrecord FROM 'test.csv'")

Unable to retrieve JDBC result set for INSERT cpr_esben.CPR000_Startrecord FROM 'test.csv' (Incorrect syntax near the keyword 'FROM'.)

Do you have any suggestions to what I am doing wrong, when trying to insert the csv-file? I do not get the Incorrect syntax near the keyword 'FROM' error?


Solution

  • What if you create a statement from your data? Something like:

    # Data from your example
    df <- data.frame(matrix(c('TEST', 123, '2012-01-01', '2012-01-01', '2012-01-01'), nrow = 1), stringsAsFactors = F)
    colnames(df) <- c("SORTFELT_10","OPGAVENR","PRODDTO","PRODDTOFORRIG","opretdato")
    class(df$SORTFELT_10) <- "character"
    class(df$OPGAVENR) <- "character"
    class(df$PRODDTO) <- "character"
    class(df$PRODDTOFORRIG) <- "character"
    class(df$opretdato) <- "character"
    
    # Formatting rows to insert into SQL statement
    rows <- apply(df, 1, function(x){paste0('"', x, '"', collapse = ', ')})
    rows <- paste0('(', rows, ')')
    
    # SQL statement
    statement <- paste0(
      "INSERT INTO cpr_esben.CPR000_Startrecord (", 
      paste0(colnames(df), collapse = ', '), 
      ')',
      ' VALUES ',
      paste0(rows, collapse = ', ')
    )
    
    dbSendQuery(ch, statement)
    

    This should work for any number of rows in your df