Search code examples
bashgzipgunzip

Bash script - Parallel unzip and waiting for ending


I need to uncompress some archives and I'd like to speedup the process. Here my piece of script:

for archive in $path; do

    STEM=$(basename "${archive}" .gz)
    gunzip -c $archive > $here/$STEM

done

for file in `ls "$here"`; do
     ... processing ...
done

Is there a way to uncompress multiple (all) archives at once and wait for completation?

In other words, I need something like that:

for archive in $path; do

    ... parallel unzip ...

done

WAIT

for file in `ls "$here"`; do
     ... processing ...
done

Thanks


Solution

  • You can do it quite concisely and simply with GNU Parallel like this:

    parallel 'gunzip -c {} > "$here/$(basename {} .gz)"' ::: $path
    

    Please use a copy of a few files in a small directory for testing until you get the hang of it.

    If you have 10,000 files to unzip, this will not suddenly start 10,000 unzip jobs - instead if you have say, 8 CPU cores, it will run 8 unzip processes at a time till all 10,000 are done. You can change the number of jobs at a time to a fixed number, or some percentage of the available CPU's.

    You can also get a progress meter with parallel --progress ... or parallel --bar ....

    You can also ask GNU Parallel to tell you what it would do without doing anything by using parallel --dry-run ....