Search code examples
pythonregexzippython-repython-zipfile

Python Regex to extract file where filename contains and also should not contain specific pattern from a zip folder


I want to extract just one specific single file from the zip folder which has the below 3 files.
Basically it should start with 'kpidata_nfile' and should not contain 'fileheader'

kpidata_nfile_20220919-20220925_fileheader.csv
kpidata_nfile_20220905-20220911.csv
othername_kpidata_nfile_20220905-20220911.csv

Below is my code i have tried-

from zipfile import ZipFile
import re
import os
for x in os.listdir('.'):
  if re.match('.*\.(zip)', x):
      with ZipFile(x, 'r') as zip:
          for info in zip.infolist():
              if re.match(r'^kpidata_nfile_', info.filename):
                  zip.extract(info)

Output required - kpidata_nfile_20220905-20220911.csv


Solution

  • This regex does what you require:

    ^kpidata_nfile(?:(?!fileheader).)*$
    

    See this answer for more about the (?:(?!fileheader).)*$ part.

    You can see the regex working on your example filenames here.

    The regex is not particularly readable, so it might be better to use Python expressions instead of regex. Something like:

    fname = info.filename
    if fname.startswith('kpidata_nfile') and 'fileheader' not in fname: