Search code examples
python-3.xglob

List Files from Directory with More than One Period


I'm trying to list files from a directory. The problem is that some of these files have numbers after their extensions that look like sample1.csv.1. When I try to list files in the directory, these files are omitted from the list. This is not a problem for files like sample.txt.1. I've tried these 3 approaches:

import os 
path = 'C:\\my\\path\\here'
missingFiles1 = os.listdir(path)

missingFiles2 = []
for files in os.walk(path):
    missingFiles2.append(files)

import glob
missingFiles3 = []
withRegex = path + "\sample*.[0-9]" # Actual file starts with an L
for files in glob.glob(withRegex):
    missingFiles3.append(files)

I tried an iterator too but have already forgotten what the code looked like. I got some good pointers here but I couldn't get it to work. Any help would be greatly appreciated.

Using:

Python 3.6.8

glob3 0.0.1


Solution

  • \sample.[0-9] Should match sample.1, but not sample.txt.1 and not sample1.txt or sample1.txt.1

    What you're looking for is not possible to express using glob patterns.

    You could get all files starting with sample and ending in any extension and .1 using this glob-pattern:

    sample*.*.[0-9] This would fit any file starting with sample, followed by anything, followed by a dot, followed by anything followed by yet another dot and a number

    Glob patterns are not regular expressions and you can't make it match "one or none".
    Glob only knows:
    * => any characters (1-N) or no character at all
    ? => one character, but not none
    [ab] => either a or b, but not none
    [0-9] => any from 0-9, but not none

    You're looking to match a number or no number in front of the first extension and then a number in the second extension if I understand you correctly.

    import os 
    path = 'C:\\Users\\mastacheata\\test'
    missingFiles1 = os.listdir(path)
    
    missingFiles2 = []
    for files in os.walk(path):
        missingFiles2.append(files)
    
    import glob
    missingFiles3 = []
    withRegex = path + "/sample*.*.[0-9]" # Actual file starts with an L
    for files in glob.glob(withRegex):
        missingFiles3.append(files)
    
    print(missingFiles3)
    

    In C:\Users\mastacheata\test there are 3 files: findfiles.py, sample1.txt.1 and sample.txt.1

    This is the output:

    ['C:\\Users\\mastacheata\\test\\sample.txt.1', 'C:\\Users\\mastacheata\\test\\sample1.txt.1']