Search code examples
pythonfor-looplistdir

Traverse a dictionary and do the same thing, how to optimize?


I have a file dictionary for which structure looks like as follow

+-- folder1
| +-- folder2
    | +--A.py
    | +--A.txt
| +-- folder3 
    | +--folder4
        | +--B.py
        | +--B.txt
| +-- C.py
| +-- C.txt

What I want to know is to find all the .py files in folder1 and write its relative path connected by _. For example, B.py can be folder1_folder3_folder4_B.py. Here is what I do.

import os
file_list = os.listdir(folder1)
for file in file_list:
    if len(file.split('.')) ==1 and file.split('.')[-1]=='py': # C.py
       print(folder1 + file) 
    elif len(file.split('.')) ==1 and file.split('.')[-1]!='py':  # C.txt
       pass
    else:
       file1_list = os.listdir(file):
       for file1 in file1_list:
           if len(file1.split('.')) ==1 and file1.split('.')[-1]=='py': # A.py
               print(folder1 + file + file1) 
           elif len(file1.split('.')) ==1 and file1.split('.')[-1]!='py':  # A.txt
               pass
           else:
               file2_list = os.listdir(file1):
               for file2 in file2_list:
                   if len(file2.split('.')) ==1 and file2.split('.')[-1]=='py': # B.py
                       print(folder1 + file + file1 + file2) 
                   elif len(file2.split('.')) ==1 and file2.split('.')[-1]!='py':  # B.txt
                       pass
                   else: 
                       pass # Actually I dont know how to write

There are two disadvantages:

(1) I don't know when to stop the for loop though I can get the max depth of folder1

(2) The for loop has so many repeat operations, obviously, it can be optimized.

Someone has a good answer?


Solution

  • os.walk recursively walks a directory tree. fnmatch.fnmatch can wildcard match file names. os.path.relpath can limit complex root paths to just the path of subfolders.

    Given testdir:

    C:\TESTDIR
    └───folder1
        │   C.py
        │   C.txt
        ├───folder2
        │       A.py
        │       A.txt
        └───folder3
            └───folder4
                    B.py
                    B.txt
    

    and code:

    import os
    from fnmatch import fnmatch
    
    def magic(root):
        for path,dirs,files in os.walk(root):
            # fixes paths that start with .
            relpath = '' if root == path else os.path.relpath(path,root)
            for file in files:
                if fnmatch(file,'*.py'):
                    name = os.path.join(relpath,file)
                    yield name.replace(os.path.sep,'_')
    
    root = r'.\testdir' # A path that starts with . for testing
    
    for name in magic(root):
        print(name)
    

    Output:

    folder1_C.py
    folder1_folder2_A.py
    folder1_folder3_folder4_B.py
    

    You should consider what you want to happen if a filename contains an underscore, however 😊