python multithreading multiprocessing race-condition

Python: Very strange behavior with multiprocessing; later code causes "retroactive" slowdown of earlier code

I'm trying to learn how to implement multiprocessing for computing Monte Carlo simulations. I reproduced the code from this simple tutorial where the aim is to compute an integral. I also compare it to the answer from WolframAlpha and compute the error. The first part of my code has no problems and is just there to define the integral function and declare some constants:

import numpy as np
import multiprocessing as mp
import time

def integrate(iterations):
    np.random.seed()
    mc_sum = 0
    chunks = 10000
    chunk_size = int(iterations/chunks)

    for i in range(chunks):
        u = np.random.uniform(size=chunk_size)
        mc_sum += np.sum(np.exp(-u * u))

    normed = mc_sum / iterations
    return normed

wolfram_answer = 0.746824132812427
mc_iterations = 1000000000

But there's some very spooky stuff that happens in the next two parts (I've labelled them because it's important). First (labelled "BLOCK 1"), I do the simulation without any multiprocessing at all, just to get a benchmark. After this (labelled "BLOCK 2"), I do the same thing but with a multiprocessing step. If you're reproducing this, you may want to adjust the num_procs variable depending on how many cores your machines has:

#### BLOCK 1
single_before = time.time()
single = integrate(mc_iterations)
single_after = time.time()
single_duration = np.round(single_after - single_before, 3)
error_single = (wolfram_answer - single)/wolfram_answer

print(mc_iterations, "iterations on single-thread:",
      single_duration, "seconds.")
print("Estimation error:", error_single)
print("")

#### BLOCK 2
if __name__ == "__main__":
    num_procs = 8
    multi_iterations = int(mc_iterations / num_procs)

    multi_before = time.time()
    pool = mp.Pool(processes = num_procs)

    multi_result = pool.map(integrate, [multi_iterations]*num_procs)
    multi_result = np.array(multi_result).mean()
    multi_after = time.time()

    multi_duration = np.round(multi_after - multi_before, 3)
    error_multi = (wolfram_answer - multi_result)/wolfram_answer

    print(num_procs, "threads with", multi_iterations, "iterations each:",
          multi_duration, "seconds.")
    print("Estimation error:", error_multi)

The output is:

1000000000 iterations on single-thread: 37.448 seconds.
Estimation error: 1.17978774235e-05

8 threads with 125000000 iterations each: 54.697 seconds.
Estimation error: -5.88380936901e-06

So, the multiprocessing is slower. That's not at all unheard of; maybe the overhead from the multiprocessing is just more than the gains from the parallelization?

But, that is not what is happening. Watch what happens when I merely comment out the first block:

#### BLOCK 1
##single_before = time.time()
##single = integrate(mc_iterations)
##single_after = time.time()
##single_duration = np.round(single_after - single_before, 3)
##error_single = (wolfram_answer - single)/wolfram_answer
##
##print(mc_iterations, "iterations on single-thread:",
##      single_duration, "seconds.")
##print("Estimation error:", error_single)
##print("")

#### BLOCK 2
if __name__ == "__main__":
    num_procs = 8
    multi_iterations = int(mc_iterations / num_procs)

    multi_before = time.time()
    pool = mp.Pool(processes = num_procs)

    multi_result = pool.map(integrate, [multi_iterations]*num_procs)
    multi_result = np.array(multi_result).mean()
    multi_after = time.time()

    multi_duration = np.round(multi_after - multi_before, 3)
    error_multi = (wolfram_answer - multi_result)/wolfram_answer

    print(num_procs, "threads with", multi_iterations, "iterations each:",
          multi_duration, "seconds.")
    print("Estimation error:", error_multi)

The output is:

8 threads with 125000000 iterations each: 6.662 seconds.
Estimation error: 3.86063069069e-06

That's right -- the time to complete the multiprocessing goes down from 55 seconds to less than 7 seconds! And that's not even the weirdest part. Watch what happens when I move Block 1 to be after Block 2:

#### BLOCK 2
if __name__ == "__main__":
    num_procs = 8
    multi_iterations = int(mc_iterations / num_procs)

    multi_before = time.time()
    pool = mp.Pool(processes = num_procs)

    multi_result = pool.map(integrate, [multi_iterations]*num_procs)
    multi_result = np.array(multi_result).mean()
    multi_after = time.time()

    multi_duration = np.round(multi_after - multi_before, 3)
    error_multi = (wolfram_answer - multi_result)/wolfram_answer

    print(num_procs, "threads with", multi_iterations, "iterations each:",
          multi_duration, "seconds.")
    print("Estimation error:", error_multi)

#### BLOCK 1
single_before = time.time()
single = integrate(mc_iterations)
single_after = time.time()
single_duration = np.round(single_after - single_before, 3)
error_single = (wolfram_answer - single)/wolfram_answer

print(mc_iterations, "iterations on single-thread:",
      single_duration, "seconds.")
print("Estimation error:", error_single)
print("")

The output is:

8 threads with 125000000 iterations each: 54.938 seconds.
Estimation error: 7.42415402896e-06
1000000000 iterations on single-thread: 37.396 seconds.
Estimation error: 9.79800494235e-06

We're back to the slow output again, which is completely crazy! Isn't Python supposed to be interpreted? I know that statement comes with a hundred caveats, but I took for granted that the code gets executed line-by-line, so stuff that comes afterwards (outside of functions, classes, etc) can't affect the stuff from before, because it hasn't been "looked at" yet.

So, how can the stuff that gets executed after the multiprocessing step has concluded, retroactively slow down the multiprocessing code?

Finally, the fast behavior is restored merely by indenting Block 1 to be inside the if __name__ == "__main__" block, because of course it does:

#### BLOCK 2
if __name__ == "__main__":
    num_procs = 8
    multi_iterations = int(mc_iterations / num_procs)

    multi_before = time.time()
    pool = mp.Pool(processes = num_procs)

    multi_result = pool.map(integrate, [multi_iterations]*num_procs)
    multi_result = np.array(multi_result).mean()
    multi_after = time.time()

    multi_duration = np.round(multi_after - multi_before, 3)
    error_multi = (wolfram_answer - multi_result)/wolfram_answer

    print(num_procs, "threads with", multi_iterations, "iterations each:",
          multi_duration, "seconds.")
    print("Estimation error:", error_multi)

    #### BLOCK 1
    single_before = time.time()
    single = integrate(mc_iterations)
    single_after = time.time()
    single_duration = np.round(single_after - single_before, 3)
    error_single = (wolfram_answer - single)/wolfram_answer

    print(mc_iterations, "iterations on single-thread:",
          single_duration, "seconds.")
    print("Estimation error:", error_single)
    print("")

The output is:

8 threads with 125000000 iterations each: 7.293 seconds.
Estimation error: 1.10350027622e-05
1000000000 iterations on single-thread: 31.035 seconds.
Estimation error: 2.53582945763e-05

And the fast behavior is also restored if you keep Block 1 inside the if block, but move it to above where num_procs is defined (not shown here because this question is already getting long).

So, what on Earth is causing this behavior? I'm guessing it's some kind of race-condition to do with threading and process branching, but from my level of expertise it might as well be that my Python interpreter is haunted.

Solution

This is because you are using Windows. On Windows, each subprocess is generated using the 'spawn' method which essentially starts a new python interpreter and imports your module instead of forking the process.

This is a problem, because all the code outside if __name__ == '__main__' is executed again. This can lead to a multiprocessing bomb if you put the multiprocessing code at the top-level, because it will start spawning processes until you run out of memory.

This is actually warned about in the docs

Safe importing of main module

Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process).

...

Instead one should protect the “entry point” of the program by using if __name__ == '__main__'

...

This allows the newly spawned Python interpreter to safely import the module...

That section used to be called "Windows" in the older docs on Python 2.