Search code examples
pythonpython-os

How to scan only some directories or exclude directories while using os.walk()


I would need to exclude a few directories or only scan some of them while using os.walk(). I am trying to get the most recent files. I learned how to do this from this post but it only return back one file. For my project I would need a list of 5 or more recent files. From this post it shows on how to scan a few dirs only but I have no idea on how to implement it in the first post answer.

I want to exclude the directory which is the recently modified file. If Folder 3 is the recently modified file then the next time i scan looking for the 2 or 3 or other i want to exclude that directory.

Here is my file layout:

MainFile(CurrentOne)
|
|-- Projects(the one I am scanning)
    #the following folders all have images in them but they are created at the same time as the folder
    |-- Folder 1
    |
    |-- Folder 2  
    |
    |-- Folder 3
    |
    |-- etc...

My previous approach was:

I cant show the code as I have deleted that piece of code but I can explain it:

First: I would first get a list of the dirs in the folder using os.listdir(Projects)

Second: I would check to see if I have more than 5 or less than or equal to 5

Third: I would go into each folder(I had them put in a list in the first operation) and use stats = os.stat(dirname) to get info about it.

Fourth: I put all of the info in a list using recent.insert(0, stats[8])

Lastly: I would compare all the times and get 5 of them but they are all incorrect.

Edit

Once I get the most recently modified file I would want to exclude that directory from being scanned or only scan the other directories. For example pretend folder 1 was recently modified and python displayed folder 1. I then would want to exclude that directory while scanning for the second recently modified directory


Solution

  • After reading @tripleee is comment I have made this piece of code that gets most recently modified files.

    import os
    
    os.chdir('Folder')
    projloc = os.getcwd() #getting the folder to scan
    
    list_of_dirs_to_exclude = []
    
    def get_recent_files():
        max_mtime = 0
        
        for root, dirs, files in os.walk(projloc):
            if root not in list_of_dirs_to_exclude: # I have made a change by adding the `not` in unlike @tripleee's answer
                for fname in files:
                    full_path = os.path.join(root, fname)
                    mtime = os.stat(full_path).st_mtime
                    if mtime > max_mtime:
                        max_mtime = mtime
                        max_dir = root
                        max_file = fname
    
        list_of_dirs_to_exclude.insert(0, max_dir)
        print(max_file)
    
        if len(list_of_dirs_to_exclude) == 5: #You can keep whatever number you want such as 6, 7, 4 etc...
            pass
    
        else:
            get_recent_files()
    
    get_recent_files()
    

    Here is updated code if you want the code all in the same def

    def get_recent_files():
        list_of_dirs_to_exclude = []
        list_of_dirs = []
        max_mtime = 0
    
        for dirs in os.listdir(projloc): #projloc is predefined for me. I got it using the same method in the above code
            list_of_dirs.insert(0, dirs)
    
        while len(list_of_dirs) != 5: 
            for root, dirs, files in os.walk(projloc):
                if root not in list_of_dirs_to_exclude:
                    for fname in files:
                        full_path = os.path.join(root, fname)
                        mtime = os.stat(full_path).st_mtime
                        if mtime > max_mtime:
                            max_mtime = mtime
                            max_dir = root
                            max_file = fname
    
            list_of_dirs_to_exclude.insert(0, max_dir)
            print(max_file)
            max_mtime = 0
    
            if len(list_of_dirs_to_exclude) == 5:
                break