Search code examples
pythonpathlib

os walk with specific extension


I want to loop through files with extension ends with CR2, CR3, cr2, cr3 (contain cr in the extensions) only. Currently, I am using os.walk() but people recommend to use pathlib which I can do something like path.glob('*.jpg') but still I cannot specify the desired condition. Is there a better way to do this?

for root, dirs, files in os.walk(cfg.RAWIMG_DIR):
    dirs.sort()
    for file in files:
        if dir > '10':
            path = Path(root) / file
            if 'cr' in path.suffix.lower():

Additionally, since there are too many files, I would to process portion by portion (Say 10 files at a time) during the loop. This is why I need the list of dir names as well.


Solution

  • using pathlib you could do this:

    from pathlib import Path
    
    path = Path(cfg.RAWIMG_DIR)
    for file in path.glob("**/*.[Cc][Rr][23]"):
        print(file)
    

    the ** part of the glob will match all the subdirs of path.

    Note that this glob also matches *.cR3 and *.Cr2 etc (mixed-case). not sure if you want that.


    if you want to match the suffixes exactly you could to this:

    from pathlib import Path
    
    path = Path(cfg.RAWIMG_DIR)
    suffixes = {".cr2", ".cr3", ".CR2", ".CR3"}
    for file in path.glob("**/*.*"):
        if file.suffix in suffixes:
            print(file)
    

    note that both versions print the full path of the files.

    and both versions are lazy. you could turn any one of them into a generator and use batched (from the Itertools Recipes) in order to get batches of files to work on.