Search code examples
rimportmergeconcatenation

How to I make my function import and concatenate/merge "all" the files in a folder?


Due to....limitations I have been forced to download my data manually into one csv file at a time. Until now, this hasn't been an issue. I've saved all off my files in the same folder, so I've been able to use a function so simply merge them (all column names are exactly the same).

I have recently have to download multitudes more data than previously, however. I am currently trying to import/concatenate 513 csv-files at the same time and it seems my function has hit some kind of limit. All csv files are no longer imported, which is of course very disconcerting.

I tried to move the unimported files (together with files that were successfully imported) to another folder, and I could import/concatenate those files just fine. This doesnt seem to have anything to do with the files themselves but with the sheer number of them being imported/concatenated at the same time.

Is there a way to import and concatenate "all" files in a folder with no limitations?

The top 4 and bottom 4 lines in each csv file contains metadata and needs to be disregarded. Until now I've been using the following loop to import/concatenate my files:

setwd("path")
file_list<-list.files("path")
for (file in file_list){

  # if the merged dataset doesn't exist, create it
  if (!exists("dataset")){
    dataset <- head(read_delim(file, delim=';',na="",skip=4),-4)
  }

  # if the merged dataset does exist, append to it
  if (exists("dataset")){
    temp_dataset <-head(read_delim(file, delim=';',na="",skip=4),-4)
    dataset<-rbind(dataset, temp_dataset)
    rm(temp_dataset)
  }

}

Solution

  • In base R, you would use do.call(rbind, list_data). With data.table, you can use data.table::rbindlist that will be more efficient.

    data.table

    library(data.table)
    setwd("path")
    file_list<-list.files("path")
    list_data <- lapply(file_list, function(file) head(fread(file, delim=';',na="",skip=4),-4))
    df <- rbindlist(list_data, fill = TRUE, use.names = TRUE)
    

    I added the arguments fill = TRUE and use.names = TRUE to be safe: you lose a little bit of efficiency here but you are sure you rbind columns at the place they should be.

    Base R

    setwd("path")
    file_list<-list.files("path")
    list_data <- lapply(file_list, function(file) head(read_delim(file, sep=';',na.strings = "", skip=4),-4))
    df <- do.call(rbind, list_data)