Search code examples
pythonpython-3.xzip

Extract the content of a specific folder from a zip archive in Python3


I have a zip archive whose internal structure looks like this:

file.zip
  |
   --- foo/
  |
   --- bar/
        |
         --- file1.txt
        |
         --- dir/
              |
               --- file2.txt

and I would like to extract the content of bar to an output directory using python3, getting something that looks like so:

output-dir/
    |
     --- file1.txt
    |
     --- dir/
          |
           --- file2.txt

However, when I run the code below both bar and it's content is being extracted to output-dir

import zipfile

archive = zipfile.ZipFile('path/to/file.zip')

for archive_item in archive.namelist():
    if archive_item.startswith('bar/'):
        archive.extract(archive_item, 'path/to/output-dir')

How can I tackle this problem? Thanks!


Solution

  • Instead of using ZipFile.extract, use ZipFile.open, open and shutil.copyfileobj in order to put the file exactly where you want it to be, using path manipulation to create the output path you need.

    archive = zipfile.ZipFile('path/to/file.zip')
    PREFIX = 'bar/'
    out = pathlib.Path('path/to/output-dir')
    for archive_item in archive.namelist():
        if archive_item.startswith(PREFIX):
            # strip out the leading prefix then join to `out`, note that you 
            # may want to add some securing against path traversal if the zip
            # file comes from an untrusted source
            destpath = out.joinpath(archive_item[len(PREFIX):])
            # make sure destination directory exists otherwise `open` will fail
            os.makedirs(destpath.parent, exist_ok=True)
            with archive.open(archive_item) as source,
                 open(destpath, 'wb') as dest:
                shutil.copyfileobj(source, dest)
    

    something like that.