I have twelve sub-directories and I have a python code to be run for each sub-directory. If I run it one by one it takes long time. So, I want to run the same code on all twelve sub-directories simultaneously.
My computer has two physical CPUs, each of 12 cores.
I tried the code as follows.
Each subdirectory has several files to be processed.
The following code works for each sub_directory one by one.
import os, glob
from concurrent.futures import ThreadPoolExecutor
wd = "/data0/"
sub_directories = [wd + "jan/", wd + "feb/",wd + "mar/",wd + "apr/",wd + "may/",wd + "jun/",wd + "jul/",wd + "aug/",wd + "sep/",wd + "oct/",wd + "nov/",wd + "dec/"]
for sub_directory in sub_directories:
files = glob.glob (sub_dir + "*.txt")
def processor (file):
data = np.fromfile(file)
#do some calculations
result = np.array()
np.save(result, file + ".npy")
with ThreadPoolExecutor(max_workers=5) as tpe:
tpe.map(processer, files)
How can I run this code on all twelve sub-directories simultaneously?
Change your processor() function such that it handles one month at a time. That function should run the glob()
from glob import glob
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
from os.path import join
from calendar import month_name
EXECUTOR = ProcessPoolExecutor # set this to ThreadPoolExecutor for multithreading
WD = '/data0'
MONTHS = [m.lower()[:3] for m in month_name[1:]]
def processor(month):
for file in glob(join(WD, month, '*.txt')):
pass # process the file here
def main():
with EXECUTOR() as executor:
executor.map(processor, MONTHS)
if __name__ == '__main__':
main()
ThreadPoolExecutor and ProcessPoolExecutor are interchangeable so just change EXECUTOR to whichever is your preferred mechanism