Search code examples
pythonpython-re

python module ZipFile get base folder using regex


Assume this zip file "acme_example.zip" contains below content of the files/folders :

acme/one.txt
acme/one1.txt
acme/one2.txt
acme/one3.txt
acme/one4.txt
__MACOSX
.DS_Store

And i am using this below script

    output_var = []
    skip_st = '__MACOSX'
    with ZipFile('acme_example.zip','r') as ZipObj:
        listfFiles = ZipObj.namelist()
        for elm in listfFiles:
            p = Path(elm).parts[0]
            if p not in output_var:
                output_var.append(p)
        return re.sub(skip_st, '', ''.join(str(item) for item in output_var))

This above script will exclude "__MAXOSX" but is there a way to also exclude ".DS_Store" so that we will only return "acme" as folder name?


Solution

  • As you iterate over the values, that would be better to exclude them at this moment, also as they are already strings, you can simplify the code in the join part

    skip_st = ['__MACOSX', '.DS_Store']
    with ZipFile('acme_example.zip','r') as ZipObj:
        listfFiles = ZipObj.namelist()
        for elm in listfFiles:
            p = Path(elm).parts[0]
            if p not in output_var and p not in skip_st:
                output_var.append(p)
        return ''.join(output_var)
    

    So you know, here's how you can filter at the end

    • with a list

      skip_st = ['__MACOSX', '.DS_Store']
      # ...
      return ''.join(item for item in output_var not in skip_st)
      
    • with a pattern

      skip_st = '__MACOSX|.DS_Store'
      # ...
      return re.sub(skip_st, '', ''.join(output_var))