Search code examples
pythontensorflowdirectorysubdirectory

Running all Python scripts with the same name across many directories


I have a file structure that looks something like this:

Master:

  • First
    • train.py
    • other1.py
  • Second
    • train.py
    • other2.py
  • Third
    • train.py
    • other3.py

I want to be able to have one Python script that lives in the Master directory that will do the following when executed:

  • Loop through all the subdirectories (and their subdirectories if they exist)
  • Run every Python script named train.py in each of them, in whatever order necessary

I know how to execute a given python script from another file (given its name), but I want to create a script that will execute whatever train.py scripts it encounters. Because the train.py scripts are subject to being moved around and being duplicated/deleted, I want to create an adaptable script that will run all those that it finds.

How can I do this?


Solution

  • You can use os.walk to recursively collect all train.py scripts and then run them in parallel using ProcessPoolExecutor and the subprocess module.

    import os
    import subprocess
    
    from concurrent.futures import ProcessPoolExecutor
    
    def list_python_scripts(root):
        """Finds all 'train.py' scripts in the given directory recursively."""
        scripts = []
        for root, _, filenames in os.walk(root):
            scripts.extend([
                os.path.join(root, filename) for filename in filenames
                if filename == 'train.py'
            ])
        return scripts
    
    
    def main():
        # Make sure to change the argument here to the directory you want to scan.
        scripts = list_python_scripts('master')
        with ProcessPoolExecutor(max_workers=len(scripts)) as pool:
            # Run each script in parallel and accumulate CompletedProcess results.
            results = pool.map(subprocess.run,
                               [['python', script] for script in scripts])
    
        for result in results:
            print(result.returncode, result.stdout)
    
    
    if __name__ == '__main__':
        main()