Search code examples
pythonpython-3.xunzipzip

How to extract zip file recursively?


I have a zip file which contains three zip files in it like this:

zipfile.zip\  
    dirA.zip\
         a  
    dirB.zip\
         b  
    dirC.zip\
         c

I want to extract all the inner zip files that are inside the zip file in directories with these names (dirA, dirB, dirC).
Basically, I want to end up with the following schema:

output\  
    dirA\
         a  
    dirB\
         b  
    dirC\
         c

I have tried the following:

import os, re
from zipfile import ZipFile

os.makedirs(directory)  # where directory is "\output"
with ZipFile(self.archive_name, "r") as archive:
    for id, files in data.items():
        if files:
            print("Creating", id)
            dirpath = os.path.join(directory, id)

            os.mkdir(dirpath)

            for file in files:
                match = pattern.match(filename)
                new = match.group(2)
                new_filename = os.path.join(dirpath, new)

                content = archive.open(file).read()
            with open(new_filename, "wb") as outfile:
                outfile.write(content)

But it only extracts the zip file and I end up with:

output\  
    dirA\
         dirA.zip 
    dirB\
         dirB.zip 
    dirC\
         dirC.zip

Any suggestions including code-segments will be much appreciated cause I have tried so many different things and read the docs without success.


Solution

  • When extracting the zip file, you would want to write the inner zip files to memory instead of them on disk. To do this, I've used BytesIO.

    Check out this code:

    import os
    import io
    import zipfile
    
    def extract(filename):
        z = zipfile.ZipFile(filename)
        for f in z.namelist():
            # get directory name from file
            dirname = os.path.splitext(f)[0]  
            # create new directory
            os.mkdir(dirname)  
            # read inner zip file into bytes buffer 
            content = io.BytesIO(z.read(f))
            zip_file = zipfile.ZipFile(content)
            for i in zip_file.namelist():
                zip_file.extract(i, dirname)
    

    If you run extract("zipfile.zip") with zipfile.zip as:

    zipfile.zip/
        dirA.zip/
            a
        dirB.zip/
            b
        dirC.zip/
            c
    

    Output should be:

    dirA/
      a
    dirB/
      b
    dirC/
      c