I have 100 .rds files, each approximately 2510 KB in size, and would like to bind them all together by row into one large data file.
So far I am using this:
memory.limit(size=1500000000)
files = list.files(path = "mypath", pattern = "*.rds")
dat_list = lapply(files, function (x) data.table(readRDS(x)))
all <- do.call("rbind", dat_list)
This seems to work but when running the final line I get a "cannot allocate vector of size..." error which I understand to be due to the final file I am trying to create being too large.
As you can see I have tried increasing the memory limit in R but this does not help. Is there any way I can get around this ? I have read of methods of combining csv files outside of R so the R memory is not affected - is there a similar method that can be used here?
I intend to convert this to a file - mapped big.matrix object later if that helps? I also have the same files in RData format.
Would appreciate any help anyone can offer!
Update: using the newer purrr::map_df()
function, which combines map
and bind_rows
and returns a dataframe
https://purrr.tidyverse.org/reference/map.html
library(tidyverse)
my_files = list.files(pattern = "*.rds")
my_all <- map_df(my_files, read_rds)
...
The dplyr::bind_rows()
function is explicitly an efficient implementation of the common pattern of do.call(rbind, dfs)
for binding many data frames into one.
https://dplyr.tidyverse.org/reference/bind.html
library(tidyverse)
write_rds(iris, "iris1.rds") #write three sample files
write_rds(iris, "iris2.rds")
write_rds(iris, "iris3.rds")
my_files = list.files(pattern = "*.rds")
dat_list = lapply(my_files, function (x) read_rds(x)) #switched to only read_rds()
my_all <- do.call("bind_rows", dat_list) #switched to bind_rows()