Search code examples
rdataframedplyrdata-bindingrdata

Load multiple RData files efficiently


I have a question that concerns reading files into R. I have a folder with many files that I would like to have in a dataframe in R to work with. What is the most efficient way to read in a large amount of data (over 1000 files)? My code is below, it has been running for a day now and still not all files are read in.

data = data.frame()
  
  for(file in files) {
    path = paste0("Data/",file,".RData") 
    if(file.exists(path)) {
      load(path) # as file_data
      data = dplyr::bind_rows(data, file_data)
    }
  }

Solution

  • You can list all the files and read them into a list. Then bind them at the end.

    my_files <- list.files(path = "Data/", pattern = "*.RData", full.names = T)
    
    all_data <- lapply(my_files, load, .GlobalEnv)
    
    bind_rows(mget(unlist(all_data)))
    

    A better solution than using mget is defining a new.env to load the files there and then read them into a list:

    my_files <- list.files(path = "Data/", pattern = "*.RData", full.names = T)
    
    temp <- new.env()
    lapply(my_files, load, temp)
    
    all_data <- as.list(temp)
    rm(temp) 
    
    bind_rows(all_data)