I'm trying to zip a massive directory with images that will be fed into a deep learning system. This is incredibly time consuming, so I would like to stop prematurely the zipping proccess with Ctrl + C
and zip the directory in different "batches".
Currently I'm using zip -r9v folder.zip folder
, and I've seen that the option -u
allows to update changed files and add new ones.
I'm worried about some file or the zip itself ending up corrupted if I terminate the process with Ctrl + C
. From this answer I understand that the cp
can be terminated safely, and this other answer suggests that gzip
is also safe.
Putting it all together: Is it safe to end prematurely the zip
command? Is the -u
option viable for zipping in different batches?
Is it safe to end prematurely the zip command?
In my tests, canceling zip
(Info-ZIP, 16 June 2008 (v3.0)) using CtrlC did not create a zip-archive at all, even when the already compressed data was 2.5GB. Therefore, I would say CtrlC is "safe" (you won't end up with a corrupted file, but also pointless (you did all the work for nothing).
Is the -u option viable for zipping in different batches?
Yes. Zip archives compress each file individually, so the archives you get from adding files later on are as good as adding all files in a single run. Just remember that starting zip
takes time too. So set the batch size as high as acceptable to save time.
Here is a script that adds all your files to the zip archive, but gives a chance to stop the compression at every 100th file.
#! /bin/bash
batchsize=100
shopt -s globstar
files=(folder/**)
echo "Press enter to stop compression after this batch."
for ((startfile=0; startfile<"${#files[@]}"; startfile+=batchsize)); do
((startfile==0)) && u= || u=u
zip "-r9v$u" folder.zip "${files[@]:startfile:batchsize}"
u=u
if read -t 0; then
echo "Compression stopped before file $startfile."
echo "Re-run this script with startfile=$startfile to continue".
exit
fi
done
For more speed you might want to look into alternative zip
implementations.