Search code examples
linuxterminalzip

Unzip all gz files in all subdirectories in the terminal


Is there a way to unzip all gz files in the folder containing the zipfiles. When zip files are in subdirectories. A query for

find -type f -name "*.gz"

Gives results like this:

./datasets/auto/auto.csv.gz
./datasets/prnn_synth/prnn_synth.csv.gz
./datasets/sleep/sleep.csv.gz
./datasets/mfeat-zernike/mfeat-zernike.csv.gz
./datasets/sonar/sonar.csv.gz
./datasets/wine-quality-white/wine-quality-white.csv.gz
./datasets/ring/ring.csv.gz
./datasets/diabetes/diabetes.csv.g

Solution

  • If you want, for each of those, to launch "gzip -d" on them:

    cd theparentdir && gzip -d $(find ./ -type f -name '*.gz')
    

    and then, to gzip them back:

    cd theparentdir && gzip $(find ./ -type f -name '*.csv')
    

    This will however choke in many cases

    • if filenames have some special characters (spaces, tabs, newline, etc) in them
    • other similar cases
    • or if there are TOO MANY files to be put after the gzip command!

    A solution would be instead, if you have GNU find, to do :

    find ... -print0 | xarsg -0 gzip -d # for the gunzip one, but still choke on files with "newline" in them

    Another (arguably better?) solution, if you have GNU find at your disposal:

    cd theparentdir && find ./ -type f -name '*.gz' -exec gzip -d '{}' '+'
    

    and to re-zip all csv in that parentdir & all subdirs:

    cd theparentdir && find ./ -type f -name '*.csv' -exec gzip '{}' '+'
    

    "+" tells GNU find to try to put as many found files as it can on each gzip invocation (instead of doing 1 gzip incocation per file, very very ressource intensive and very innefficient and slow), similar to xargs, but with some benefits (1 command only, no pipe needed)