Search code examples
pythonextractbzip2tarfilebz2

extracting a .ppm.bz2 from a custom path to a custom path


as the title says, I have several folders, several .ppm.bz2 files and I want to extract them exactly where they are using python.

Directory structure image

I am traversing in the folders as this:

 import tarfile
 import os
 path = '/Users/ankitkumar/Downloads/colorferet/dvd1/data/images/'
 folders = os.listdir(path)
 for folder in folders:  #the folders starting like 00001
     if not folder.startswith("0"):
         pass
     path2 = path + folder
     zips = os.listdir(path2)
     for zip in zips:
         if not zip.startswith("0"):
             pass
         path3 = path2+"/"+zip

         fh = tarfile.open(path3, 'r:bz2')
         outpath = path2+"/"
         fh.extractall(outpath)
         fh.close

`

then I get this error `

Traceback (most recent call last):
  File "ZIP.py", line 16, in <module>
    fh = tarfile.open(path3, 'r:bz2')
  File "/anaconda2/lib/python2.7/tarfile.py", line 1693, in open
    return func(name, filemode, fileobj, **kwargs)
  File "/anaconda2/lib/python2.7/tarfile.py", line 1778, in bz2open
    t = cls.taropen(name, mode, fileobj, **kwargs)
  File "/anaconda2/lib/python2.7/tarfile.py", line 1723, in taropen
    return cls(name, mode, fileobj, **kwargs)
  File "/anaconda2/lib/python2.7/tarfile.py", line 1587, in __init__
    self.firstmember = self.next()
  File "/anaconda2/lib/python2.7/tarfile.py", line 2370, in next
    raise ReadError(str(e))
tarfile.ReadError: invalid header

`


Solution

  • The tarfile module is for tar files, including tar.bz2. if your file is not tar you should use bz2 module directly.

    Also, try using os.walk instead of multiple listdir as it can traverse the tree

    import os
    import bz2
    import shutil
    
    for path, dirs, files in os.walk(path):
        for filename in files:
            basename, ext = os.path.splitext(filename)
            if ext.lower() != '.bz2':
                continue
            fullname = os.path.join(path, filename)
            newname = os.path.join(path, basename)
            with bz2.open(fullname) as fh, open(newname, 'wb') as fw:
                shutil.copyfileobj(fh, fw)
    

    This will uncompress all .bz2 files in all subfolders, in the same place they are. All other files will stay the same. If the uncompressed file already exists it will be overwritten.

    Please backup your data before running destructive code