Consider one file 'C:/ZFILE' that includes many zip files.
Now, consider that each of these zip includes many csv, among which one specific csv named 'NAME.CSV', all these scattered 'NAME.CSV' being similarly named and structured (i.e., same columns).
How to rbind all these scattered csv?
The script below allows that, but a function would be more appropriate.
How to do this?
Thanks
zfile <- "C:/ZFILE"
zlist <- list.files(path = zfile, pattern = "\\.zip$", recursive = FALSE, full.names = TRUE)
zlist # list all zip from the zfile file
zunzip <- lapply(zlist, unzip, exdir = zfile) # unzip all zip in the zfile file (may takes time depending on the number of zip)
library(data.table) # rbindlist & fread
csv_name <- "NAME.CSV"
csv_list <- list.files(path = zfile, pattern = paste0("\\", csv_name, "$"), recursive = TRUE, ignore.case = FALSE, full.names = TRUE)
csv_list # list all 'NAME.CSV' from the zfile file
csv_rbind <- rbindlist(sapply(csv_list, fread, simplify = FALSE), idcol = 'filename')
You can try this type of function ( you can pass the unzip call directly to the cmd
param of data.table::fread()
)
get_zipped_csv <- function(path) {
fnames = list.files(path,full.names = T)
rbindlist(lapply(fnames, \(f) fread(cmd = paste0("unzip -p ",f))[,src:=f]))
}
Usage:
get_zipped_csv(path = "C:\ZFILE\")