Search code examples
pythonzipunzip

Extracting files from deep within a zip file in python


I am working on a script that can automate data extraction from a zip file that we get from the Sentinel 2B satellite.

The files I need from the zip file have a path something like this: zipfile.zip/somefolder.SAFE/GRANULE/main_folder/IMG_DATA/

They are all .jp2 files, and I need to extract them to another path with the following structure: my_path/main_folder/

I need to retain this main_folder name (it varies from file to file) from the zip file.

Ziplist is a list of zip file paths

for i in zipList:
    if not os.path.exists(basePath + '/Raw_data/' + os.path.basename(i)[:-4]):
        os.makedirs(basePath + '/Raw_data/' + os.path.basename(i)[:-4])

    zipped_file = zipfile.ZipFile(i, 'r')
    for file in zipped_file.namelist():
        if fnmatch.fnmatch(file, "*.jp2"):
            zipped_file.extract(file, basePath + '/Raw_data/' + os.path.basename(i)[:-4])

But this maintains the file structure of the zip file. I want just the .jp2 files in /Raw_data/os.path.basename(i)[:-4]


Solution

  • I figured it out:

    for i in zipList:
        folder_path = basePath + '/Raw_data/' + os.path.basename(i)[:-4]
        if not os.path.exists(folder_path):
            os.makedirs(folder_path)
        # print(os.path.basename(i))
        z = zipfile.ZipFile(i, 'r')
        for file in z.namelist():
            if fnmatch.fnmatch(file, "*.jp2") and str(file)[-8:-4] in ["_B02", "_B03", "_B04", "_B08"]:
                target = open(folder_path + '/' + os.path.basename(file), 'wb')
                target.write(z.read(file))
                target.close()
        z.close()
    

    I had to create a new file at the location, and copy over the jp2 file from the zip file to it.