I have a lot a .zip files. I need to:
It is possible in R? For many files it is quite a difficult task, because it is large dataset and I need to process in sequence. Besides the .csv file, there are a few other files in each zip folder.
Using unzip
and zip
functions. In an lapply
loop, we first create a tempfile
which is used to unzip
and which we can read.csv
. We identify the .csv with grep
. Then edit and reverse the process. Only the .csv gets updated, the other files are untouched.
toEdit <- c("df1.zip", "df2.zip", "df3.zip")
lapply(toEdit, function(z) {
temp <- tempfile()
temp <- unzip(z)
r <- read.csv(temp[grep("csv", temp)])
## edit data
r <- r/10
## end edit data
nn <- gsub("zip", "csv", z)
write.csv(r, nn)
zip(z, nn)
unlink(temp)
})
Example data:
Creating .zip archives with one .csv file and some other stuff in it.
write("foo", "xy1.foo")
write("foo", "xy2.foo")
sapply(1:3, function(i) {
write.csv(data.frame(matrix(1:12, 3, 4)), paste0("df", i, ".csv"))
zip(paste0("df", i, ".zip"), paste0("df", i, ".csv"))
zip(paste0("df", i, ".zip"), "xy1.foo")
zip(paste0("df", i, ".zip"), "xy2.foo")
})