I have a nested tarfile in the form of
tarfile.tar.gz
--tar1.gz
--tar1.txt
--tar2.gz
--tar3.gz
I wanted to write a little script in python to extract all tars breadth first in to the same order of folders i.e. tar1.txt should lie in tarfile/tar1/
Here's the script,
#!/usr/bin/python
import os
import re
import tarfile
data = os.path.join(os.getcwd(), 'data')
dirs = [data]
while len(dirs):
dirpath = dirs.pop(0)
for subpath in os.listdir(dirpath):
if not re.search('(.tar)?.gz$', subpath):
continue
with tarfile.open(os.path.join(dirpath, subpath)) as tarf:
tarf.extractall(path=dirpath)
for subpath in os.listdir(dirpath):
newpath = os.path.join(dirpath, subpath)
if os.path.isdir(newpath):
dirs.append(newpath)
elif dirpath != data or os.path.islink(newpath):
os.remove(newpath)
But when i run the script I get the following error:
Traceback (most recent call last):
File "./extract.py", line 16, in <module>
with tarfile.open(os.path.join(dirpath, subpath)) as tarf:
File "/usr/lib/python2.7/tarfile.py", line 1678, in open
raise ReadError("file could not be opened successfully")
tarfile.ReadError: file could not be opened successfully
The '.tar.gz' file is extracted fine but not the nested '.gz' files. What's up here? Does tarfile module not handle .gz files?
.gz denotes that the file is gzipped; .tar.gz means a tar file that has been gzipped. tarfile
handles gzipped tars perfectly well, but it doesn't handle files that aren't tar archives (like your tar1.gz).