Search code examples
linuxbashgzip

Recursively find and extract gz files without extension


I have a folder structure (a bad rip of an old website - you maybe know about the old double compression problem), and some files are called somefilename.txt but actually gz compressed. I am looking for a way to identify these files and then unpacking them into the txt files that they actually are. Environment is Linux.


Solution

  • You could use a bash function to test the file type with file and decompress if it is gzip-compressed. And you could call the (exported) function in execdir action of find:

    if_gz_decompress () {
      type=$(file -bi "$1")
      if [[ "$type" =~ ^application/gzip\; ]]; then
        tmp=$(mktemp) &&
        gunzip -dc "$1" > "$tmp" &&
        mv -i "$tmp" "$1" ||
        rm -f "$tmp"
      fi
    }
    export -f if_gz_decompress
    find . -type f -execdir bash -c 'if_gz_decompress "$1"' _ {} \;
    

    Because of the -i option of mv you will have to confirm for each found file. Remove it if you are 100% sure it works as you wish (or if you have a backup).

    If you have many such files and some are in the same directory we can maybe optimize a bit with:

    if_gz_decompress () {
      for f in "$@"; do
        type=$(file -bi "$f")
        if [[ "$type" =~ ^application/gzip\; ]]; then
          tmp=$(mktemp) &&
          gunzip -dc "$f" > "$tmp" &&
          mv -i "$tmp" "$f" ||
          rm -f "$tmp"
        fi
      done
    }
    export -f if_gz_decompress
    find . -type f -execdir bash -c 'if_gz_decompress "$@"' _ {} +