Search code examples
pythonunixextractzip

Python; how to extract files without including parent directory


I'm an noob when it comes to Python and I need help! I am trying to use the zipfile module for the latest version of Python 3 and I'm having trouble attempting to exclude the parent directory (folder) while preparing to extract the zip file.

An example of zip file directory looks like this:

Content/
Content/Runtime/
Content/Runtime/test.txt
Content/Runtime/test.png
Content/Documentation/
Content/Documentation/license.txt

I don't know how to exclude the parent directory (first folder) from extraction. I only want the Runtime & Documentation directory and its contents extracted.

Please help, thanks, and much appreciated!


Solution

  • In this case, the built-in methods for extraction won't work (as they extract the full path), but you can easily do it yourself:

    import os
    import shutil
    
    destination_dir = './destination'
    
    with zipfile.ZipFile('./tmp.zip', 'r') as z:
        # See https://docs.python.org/3/library/zipfile.html#zipfile.ZipFile.infolist
        for file_info in z.infolist():
            # Only extract regular files
            if file_info.is_dir():
                continue
    
            file_path = file_info.filename
            # Only extract things under 'Content/'
            if not file_path.startswith('Content/'):
                continue
    
            # Split at slashes, at most one time, and take the second part
            # so that we skip the 'Content/' part
            extracted_path = file_path.split('/', 1)[1]
    
            # Combine with the destination directory
            extracted_path = os.path.join(destination_dir, extracted_path)
            print(extracted_path)
    
            # Make sure the directory for the file exists
            os.makedirs(os.path.dirname(extracted_path), exist_ok=True)
    
            # Extract the file. Don't use `z.extract` as it will concatenate
            # the full path from inside the zip.
            # WARNING: This code does not check for path traversal vulnerabilities
            # Refer to the big warning inside the ZipFile module for more details
            with open(extracted_path, 'wb') as dst:
              with z.open(file_info, 'r') as src:
                shutil.copyfileobj(src, dst)