Search code examples
pythontarfile

tarfile doesn't work for .gz files


I have a nested tarfile in the form of

tarfile.tar.gz
--tar1.gz
  --tar1.txt
--tar2.gz
--tar3.gz

I wanted to write a little script in python to extract all tars breadth first in to the same order of folders i.e. tar1.txt should lie in tarfile/tar1/

Here's the script,

#!/usr/bin/python

import os
import re
import tarfile

data = os.path.join(os.getcwd(), 'data')
dirs = [data]

while len(dirs):
    dirpath = dirs.pop(0)
    for subpath in os.listdir(dirpath):
        if not re.search('(.tar)?.gz$', subpath):
            continue
        with tarfile.open(os.path.join(dirpath, subpath)) as tarf:
            tarf.extractall(path=dirpath)
    for subpath in os.listdir(dirpath):
        newpath = os.path.join(dirpath, subpath)
        if os.path.isdir(newpath):
            dirs.append(newpath)
        elif dirpath != data or os.path.islink(newpath):
            os.remove(newpath)

But when i run the script I get the following error:

Traceback (most recent call last):
  File "./extract.py", line 16, in <module>
    with tarfile.open(os.path.join(dirpath, subpath)) as tarf:
  File "/usr/lib/python2.7/tarfile.py", line 1678, in open
    raise ReadError("file could not be opened successfully")
tarfile.ReadError: file could not be opened successfully

The '.tar.gz' file is extracted fine but not the nested '.gz' files. What's up here? Does tarfile module not handle .gz files?


Solution

  • .gz denotes that the file is gzipped; .tar.gz means a tar file that has been gzipped. tarfile handles gzipped tars perfectly well, but it doesn't handle files that aren't tar archives (like your tar1.gz).