Search code examples
pythonlinuxmultithreadingparallel-processingjoblib

Running the code on all sub-directories simultaneously


I have twelve sub-directories and I have a python code to be run for each sub-directory. If I run it one by one it takes long time. So, I want to run the same code on all twelve sub-directories simultaneously.

My computer has two physical CPUs, each of 12 cores.

I tried the code as follows.

Each subdirectory has several files to be processed.

The following code works for each sub_directory one by one.

import os, glob
from concurrent.futures import ThreadPoolExecutor

wd = "/data0/"
sub_directories = [wd + "jan/", wd + "feb/",wd + "mar/",wd + "apr/",wd + "may/",wd + "jun/",wd + "jul/",wd + "aug/",wd + "sep/",wd + "oct/",wd + "nov/",wd + "dec/"]

for sub_directory in sub_directories:
    files = glob.glob (sub_dir + "*.txt")

    def processor (file):
        data = np.fromfile(file)
        #do some calculations
        result = np.array()
        np.save(result, file + ".npy")
    with ThreadPoolExecutor(max_workers=5) as tpe:
        tpe.map(processer, files)   

How can I run this code on all twelve sub-directories simultaneously?


Solution

  • Change your processor() function such that it handles one month at a time. That function should run the glob()

    from glob import glob
    from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
    from os.path import join
    from calendar import month_name
    
    EXECUTOR = ProcessPoolExecutor # set this to ThreadPoolExecutor for multithreading
    WD = '/data0'
    MONTHS = [m.lower()[:3] for m in month_name[1:]]
    
    def processor(month):
        for file in glob(join(WD, month, '*.txt')):
            pass # process the file here
    
    def main():
        with EXECUTOR() as executor:
            executor.map(processor, MONTHS)
    
    if __name__ == '__main__':
        main()
    

    ThreadPoolExecutor and ProcessPoolExecutor are interchangeable so just change EXECUTOR to whichever is your preferred mechanism