Search code examples
pythonpathzipextractglob

Extract a list of files with a certain criteria within subdirectory of zip archive in python


I want to access some .jp2 image files inside a zip file and create a list of their paths. The zip file contains a directory folder named S2A_MSIL2A_20170420T103021_N0204_R108_T32UNB_20170420T103454.SAFE and I am currently reading the files using glob, after having extracted the folder.

I don't want to have to extract the contents of the zip file first. I read that I cannot use glob within a zip directory, nor I can use wildcards to access files within it, so I am wondering what my options are, apart from extracting to a temporary directory.

The way I am currently getting the list is this:

dirr = r'C:\path-to-folder\S2A_MSIL2A_20170420T103021_N0204_R108_T32UNB_20170420T103454.SAFE'

jp2_files = glob.glob(dirr + '/**/IMG_DATA/**/R60m/*B??_??m.jp2', recursive=True)

There are additional different .jp2 files in the directory, for which reason I am using the glob wildcards to filter the ones I need.

I am hoping to make this work so that I can automate it for many different zip directories. Any help is highly appreciated.


Solution

  • I made it work with zipfile and fnmatch

    from zipfile import ZipFile
    import fnmatch
    zip = path_to_zip.zip
    
    with ZipFile(zipaki, 'r') as zipObj:
        file_list = zipObj.namelist()
        pattern = '*/R60m/*B???60m.jp2'
    
        filtered_list = []
        for file in file_list:
            if fnmatch.fnmatch(file, pattern):
                filtered_list.append(file)