Search code examples
pythonrecursionglobsubdirectory

Preserving all the subdirectories when saving a file - Python


I have a folder with various subdirectories and within those are files with different extensions such as .jpg, .png, etc. I would like to only extract the .jpg files and perform some work on them such as cropping, etc and then I want to save these new files (i.e. cropped ones) into the subdirectories that they came from. However, I don't want these new files saved into the same folder from which they came from. Instead, I want them saved into a different folder that contains all of the same subdirectories with the same names from which they came from. This new folder must only contain the new cropped files. I have included an illustration below to better explain my question. enter image description here

I have tried the following:

for imgs in glob.iglob(self.Main_Folder + '//**/*.jpg', recursive=True):
    Output_Folder = os.path.join(os.path.dirname(imgs), "cropped" + str(idx) + ".jpg")

However, this overwrites the .jpg files located in their subdirectories in the Main Folder (i.e. it overwrites 1.jpg, 2.jpg, etc). Any help on this would be much appreciated!


Solution

  • One thing to realize is that the paths that glob.iglob generates include the entire matched pattern, which includes your self.Main_Folder. So, you must first determine the sub-path relative to the main folder, and join that with the output folder. E.g. "main_folder/found/path.jpg" -> "found/path.jpg" -> "output_folder/found/path.jpg". The pathlib library provides an easy way to do that with Path.relative_to.

    import glob
    import os
    import pathlib
    
    main_folder = "somewhere/Main Folder"
    output_folder = "somewhere/Output Folder"
    for name in glob.iglob(os.path.join(main_folder, "**/*")):
        path = pathlib.Path(name)
        sub_path = path.relative_to(main_folder)
        output_path = os.path.join(output_folder, sub_path)
        print(output_path)
    

    You can get fancier by using more of the pathlib API (especially the ability to join paths with /), but whether this is more clear is probably debatable:

    main_folder = "somewhere/Main Folder"
    output_folder = "somewhere/Output Folder"
    for name in glob.iglob(str(main_folder / "**/*")):
        path = pathlib.Path(name)
        output_path = output_folder / path.relative_to(main_folder)
        print(output_path)
    

    In Python 3.10, which as of this writing is the newest Python version, this can be simpler:

    main_folder = pathlib.Path("scratch")
    output_folder = pathlib.Path("output")
    for name in glob.iglob("**/*", root_dir=main_folder):
        output_path = output_folder / name
        print(output_path)