Search code examples
pythonpython-3.xpathlib

How to use the path returned by Path.iterdir() to scan the files using rglob() function?


I have a main directory which has many subdirectories. Each subdirectory further has various PNG files and a sub-subdirectory. My problem is summarised in the following code snippet.

# import
from from pathlib import Path, PurePath

# mount drive
drive.mount('/content/drive')

for path in Path('/content/drive/MyDrive/Cape-Windy/Annotated-frames_COCO/').iterdir():
  print(PurePath(Path(path).rglob('*.PNG')).stem)

This generates

TypeError: expected str, bytes or os.PathLike object, not generator

Further experimentation:

for path in Path('/content/drive/MyDrive/Cape-Windy/Annotated-frames_COCO/').iterdir():
print(path)

generates the paths of the the sub-directories

/content/drive/MyDrive/Cape-Windy/Annotated-frames_COCO/Cape_Windy-546053-processing_1-2020-10-25-19-29-44-759-coco-1.0.zip ... and

for path in Path('/content/drive/MyDrive/Cape-Windy/Annotated-frames_COCO/').iterdir():
print(Path(path).rglob('*.PNG'))

gives

<generator object Path.rglob at 0x7f1c9b7d2f20> ...

Based on the documentation, I suspect that what's being returned by Path.iterdir() may be a PosixPath instance. Though I'm not sure what type is needed to make my code work. Any suggestions would be appreciated. Code is written in Python 3 in google colab, and all data is in the google drive.


Solution

  • The problem you're encountering is because the Path.rglob method returns a generator object, which is not a string, bytes or os.PathLike object. To get the base name of each PNG file without the extension, you'll need to loop through the generator and extract the stem of each file. Here you go this should solve the issue.

    from pathlib import Path, PurePath
    
    # mount the drive
    drive.mount('/content/drive')
    
    base_dir = Path('/content/drive/MyDrive/Cape-Windy/Annotated-frames_COCO/')
    
    # loop through each subdirectory
    for subdir in base_dir.iterdir():
        # loop through all PNG files in the subdirectory and its subdirectories
        for png_file in subdir.rglob('*.PNG'):
            # print the stem of each PNG file
            print(PurePath(png_file).stem)
    

    This code loops through each subdirectory, then for each subdirectory it loops through all PNG files (including those in sub-subdirectories) and prints the stem of each file, which is the base name without the extension. :)