Search code examples
pythonregexglob

Regex filenames, Python


I'm trying to get all files with excel format extensions, therefore I thought this would select any file that has xls in the filename. It would pick up on xls, xlsx, xlsm etc.

the path is a variable defined as the folder I'm extracting these files from and all_files is storing these files. shouldn't the /* define any file that has .xls in it? /*.xlsx or /*.xlsm works fine.

all_files=glob.glob(path + "/*.xls/*")

Solution

  • You are trying to get all files that have .xls in them, and you're trying the glob pattern:

    /*.xls/*
    

    This will find directories (note the trailing /) that end in .xls, not files.

    You need:

    glob.glob(path + "/*.xls*")
    

    but that would not be precise, as this would match any file having just the string .xls in them e.g. foo.xlsbar.

    The problem is that the standard shell globbing (even leveraging [], ? would not do here) is not so flexible as Regex as needed here, you can wrap the glob in some Regex check afterwards:

    import glob
    import re
    req = re.compile(r'\.xls[xm]?$')
    all_files = list(filter(lambda x: req.search(x), glob.iglob(path + '/*.xls*')))