Search code examples
pythonmultithreadingparallel-processingprocess-pool

Python ProcessPoolExecutor() not running code in parallel


I am trying to run two separate Python in parallel. To do this I am using the ProcessPoolExecutor() from concurrent.futures module. I have made two test scripts which I want to run in parallel:

Script 1:

import time

def run():
    print('starting script 1')
    t = 3
    print(f'Running script 1: Sleeping {t} second(s)')
    time.sleep(t)
    return None

if __name__ == '__main__':
    run()

Script 2:

import time

def run():
    print('starting script 2')
    t = 4
    print(f'Sleeping {t} second(s)')
    time.sleep(t)
    return None

if __name__ == '__main__':
    run()

These scripts simply sleep for 3 and 4 seconds, respectively. To call these scripts I use the following code:

import time
import concurrent.futures
import test_script_1, test_script_2

if __name__ == '__main__':

    start = time.perf_counter()
    print("Starting with scripts execution")
    script_list = ['test_script_1', 'test_script_2']

    with concurrent.futures.ProcessPoolExecutor() as executor:
        for script in script_list:
            executor.map(exec(f'{script}.run()'))



    print('all scripts done')
    finish = time.perf_counter()
    print(f'Finished in {round(finish-start,2)} second(s)')

All scripts are in the same folder. I would expect these scripts to run in parallel, but they don't. The output I get in my console is the following:

Starting with scripts execution
starting script 1
Running script 1: Sleeping 3 second(s)
starting script 2
Sleeping 4 second(s)
all scripts done
Finished in 7.0 second(s)

As you can see, the script takes 7 seconds to run in total, rather than the expected 4 seconds. What am I doing wrong?

I tried switching to ThreadPoolExecutor(), which does the exact same thing.

Update:

I have implemented the code provided by OM222O:

import time
from concurrent.futures import ProcessPoolExecutor as PPE
from os import cpu_count
import test_script_1, test_script_2

if __name__ == '__main__':
    start = time.perf_counter()
    print("Starting with scripts execution")
    print("Available CPU cores: ", cpu_count())
    script_list = [test_script_1.run, test_script_2.run]

    with PPE() as executor:
        futures = [executor.submit(script) for script in script_list]

    # wait for scripts to finish execution
    # for future in futures:
    #     future.result()

    print('all scripts done')
    finish = time.perf_counter()
    print(f'Finished in {(finish - start):.2f} seconds')

This seems to be working. The total execution time is 4 seconds. However, the console is not printing the print() statements in the individual scripts:

Starting with scripts execution
Available CPU cores:  40
all scripts done
Finished in 4.40 seconds

If I switch to the ThreadPoolExecutor() the script outputs are printed:

Starting with scripts execution
Available CPU cores:  40
starting script 1
Running script 1: Sleeping 3 second(s)
starting script 2
Sleeping 4 second(s)
all scripts done
Finished in 4.00 seconds

However, since the actual scripts I intend to run in parallel are CPU heavy processes and not I/O processes I would like to use ProcessPoolExecutor() and not ThreadPoolExecutor()

I am using Spyder 5.4.2, see below.

enter image description here


Solution

  • To be honest, I'm not sure why your code is running for 7 seconds, but you are using map incorrectly; the whole point of map is to not do for x in y, which is exactly what you're doing. I think the way you wrote your code means that it's executing all the scripts (which takes 7 seconds) for every item in the list. Also check that you actually have multiple cores available, or try passing os.cpu_count() to PPE, i.e.: PPE(os.cpu_count()).

    This works for me (I only included main.py since the other modules are easy to copy from the image):

    main.py:

    import time
    from concurrent.futures import ProcessPoolExecutor as PPE
    from os import cpu_count
    from s1 import *
    from s2 import *
    
    if __name__ == '__main__':
        start = time.perf_counter()
        print("Starting with scripts execution")
        print("Available CPU cores: ", cpu_count())
        script_list = [script_1, script_2]
    
        with PPE() as executor:
            futures = [executor.submit(script) for script in script_list]
    
        # wait for scripts to finish execution
        for future in futures:
            future.result()
    
        print('all scripts done')
        finish = time.perf_counter()
        print(f'Finished in {(finish - start):.2f} seconds')
    

    Here are the modules and the results: enter image description here