Search code examples
rdatabasebulkinsertdbivertica

Push/Export large datframe from R to Vertica database


I have a dataframe of 10M rows which needs to be uploaded back from R to Vertica Database.

The DBwrite() function from DBI is running into memory issues and I have tried increasing memory to 16g by

options(java.parameters = c("-XX:+UseConcMarkSweepGC", "-Xmx16g"))

Still the process is running into memory issue. I am planning to use bulk copy option of vertica to copy the csv file to create the table.

  1. I have created an empty table on vertica
  2. When I am executing the query
dbSendQuery(vertica, "COPY hpcom_usr.VM_test FROM LOCAL \'/opt/mount1/musoumit/MarketBasketAnalysis/Code/test.csv\'  enclosed by \'\"\' DELIMITER \',\' direct REJECTED DATA \'./code/temp/rejected.txt\' EXCEPTIONS \'./code/temp/exceptions.txt\'")

I am running into this error.

Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set", : Unable to retrieve JDBC result set JDBC ERROR: [Vertica]JDBC A ResultSet was expected but not generated from query "COPY hpcom_usr.VM_test FROM LOCAL '/opt/mount1/musoumit/MarketBasketAnalysis/Code/test.csv' enclosed by '"' DELIMITER ',' direct REJECTED DATA './code/temp/rejected.txt' EXCEPTIONS './code/temp/exceptions.txt'". Query not executed.

Please help with what i'm doing wrong here. Vertica also provides STDIN option aswell. Link

Please help me how can I execute this.

My Environment.

CENT OS 7 R 3.6.3 (No R Studio here I have to execute this from CLI) Tidyverse 1.0.x Vertica driver 9.x System 128GB Memory and 28Core system.


Solution

  • dbSendUpdate(vertica, "COPY hpcom_usr.VM_test FROM LOCAL \'/opt/mount1/musoumit/MarketBasketAnalysis/Code/test.csv\'  enclosed by \'\"\' DELIMITER \',\' direct REJECTED DATA \'./code/temp/rejected.txt\' EXCEPTIONS \'./code/temp/exceptions.txt\'")
    

    instead of dbSendQuery did the trick for me.