Search code examples
rcsvzipunzip

How to unzip file, change csv table, and zip again?


I have a lot a .zip files. I need to:

  • Open zip file
  • Edit .csv table in it
  • Zip file again with same name as the beginning

It is possible in R? For many files it is quite a difficult task, because it is large dataset and I need to process in sequence. Besides the .csv file, there are a few other files in each zip folder.


Solution

  • Using unzip and zip functions. In an lapply loop, we first create a tempfile which is used to unzip and which we can read.csv. We identify the .csv with grep. Then edit and reverse the process. Only the .csv gets updated, the other files are untouched.

    toEdit <- c("df1.zip", "df2.zip", "df3.zip")
    
    lapply(toEdit, function(z) {
      temp <- tempfile()
      temp <- unzip(z)
      r <- read.csv(temp[grep("csv", temp)])
      ## edit data
      r <- r/10
      ## end edit data
      nn <- gsub("zip", "csv", z)
      write.csv(r, nn)
      zip(z, nn)
      unlink(temp)
    })
    

    Example data:

    Creating .zip archives with one .csv file and some other stuff in it.

    write("foo", "xy1.foo")
    write("foo", "xy2.foo")
    sapply(1:3, function(i) {
      write.csv(data.frame(matrix(1:12, 3, 4)), paste0("df", i, ".csv"))
      zip(paste0("df", i, ".zip"), paste0("df", i, ".csv"))
      zip(paste0("df", i, ".zip"), "xy1.foo")
      zip(paste0("df", i, ".zip"), "xy2.foo")
    })