Search code examples
pythonpython-3.xmultithreadingpython-multithreadingconcurrent.futures

Implementation of multithreading using concurrent.future in Python


I have written a python code which convert raw data (STM Microscope) into png format and it run perfectly on my Macbook Pro.

Below is the simplified Python Code:

for root, dirs, file in os.walk(path):
    for dir in dirs:
        fpath = path +'/'+ dir
        os.chdir(fpath)
        spaths=savepath +'/'+ dir
        if os.path.exists(spaths) ==False:
           os.mkdir(spaths)

         for files in glob.glob("*.sm4"):
             for file in files:     
                 data_conv (files, file, spaths)

But it does take 30 - 40 mins for100 files.

Now, I wanted to reduce processing time using multithreading technique (using “concurrent future” library). Was trying to modify python code using YouTube video on “Python Threading Tutorial” as an example.

But I have to pass too many arguments such as “root”, “dirs.”, “file” in the executor.map() method. I don’t know how to resolve this further.

Below this the simplified multithreading Python code

def raw_data (root, dirs, file):
    for dir in dirs:
        fpath = path +'/'+ dir
        os.chdir(fpath)
        spaths=savepath +'/'+ dir
        if os.path.exists(spaths)==False:
            os.mkdir(spaths)

        for files in glob.glob("*.sm4"):
            for file in files:
                data_conv(files, file, spaths)

with concurrent.futures.ThreadPoolExecutor() as executor:
     executor.map(raw_data, root, dirs, file)

NameError: name 'root' is not defined

Any suggestion is appreciated, Thank You.


Solution

  • Thanks for the advice Iain Shelvington & Thenoneman.

    Pathlib does reduces the clutter I was having in my code.

    "ProcessPoolExecutor" worked in my CPU intense function.

      with concurrent.futures.ProcessPoolExecutor() as executor:
            executor.map(raw_data, os.walk(path))