Search code examples
rcsvrpostgresql

Import multiple csv files into postgresql database using r (memory error)


I am trying to import a dataset (with many csv files) into r and afterwards write the data into a table in a postgresql database.

I successfully connected to the database, created a loop to import the csv files and tried to import. R then returns an error, because my pc runs out of memory.

My question is: Is there a way to create a loop, which imports the files one after another, writes them into the postgresql table and deletes them afterwards? That way I would not run out of memory.

Code which returns the memory error:

`#connect to PostgreSQL database
db_tankdata <- 'tankdaten'  
host_db <- 'localhost'
db_port <- '5432'
db_user <- 'postgres'  
db_password <- 'xxx'
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname = db_tankdata, host=host_db, 
                 port=db_port, user=db_user, password=db_password)

#check if connection was succesfull
dbExistsTable(con, "prices")

#create function to load multiple csv files
import_csvfiles <- function(path){
  files <- list.files(path, pattern = "*.csv",recursive = TRUE, full.names = TRUE)
  lapply(files,read_csv) %>% bind_rows() %>% as.data.frame()
    }


#import files
prices <- import_csvfiles("path...")
dbWriteTable(con, "prices", prices , append = TRUE, row.names = FALSE)`

Thanks in advance for the feedback!


Solution

  • If you change the lapply() to include an anonymous function, you can read each file and write it to the database, reducing the amount of memory required. Since lapply() acts as an implied for() loop, you don't need an extra looping mechanism.

    import_csvfiles <- function(path){
         files <- list.files(path, pattern = "*.csv",recursive = TRUE, full.names = TRUE)
         lapply(files,function(x){ 
              prices <- read.csv(x) 
              dbWriteTable(con, "prices", prices , append = TRUE, row.names = FALSE)
              })
    }