Search code examples
rcsvziplarge-data-volumes

zipping very many files in r


I have 7300 *.csv files in a temp directory. I want to zip these into a zip archive in R. I'm using the following code, which is taking FOREVER. Is there a way to do this faster, short of exiting R and using the WinZip program?

fileListing       = list.files( pattern = '*.csv' )
outZipFileName    = gsub( '.zip', '_TZflts.zip', zipName )
sapply(seq_along( fileListing),function(ii) zip( outZipFileName, fileListing[ii] ) )

Another problem is that the zip process in the code spawns tons of garbage files, besides the zip file and its csv contents.

Thank you.

BSL


Solution

  • You do not need to loop through the files, zip can take a vector of the files to be zipped: this should speed things up. From ?zip

    files is : 'A character vector of recorded filepaths to be included.'

    Example

    # write some files to be zipped
    for(i in 1:10) write.csv(mtcars, paste0("SOtemp", i, ".csv"))
    
    # zip
    zip("SOzip", files=list.files(pattern="SOtemp\\d"))
    
    # remove files from this example
    # file.remove(c("SOzip.zip", list.files(pattern="SOtemp\\d")))