Search code examples
pythonmultithreadingcpythongil

Can I apply multithreading for computationally intensive task in python?


Update: To save your time, I give the answer directly here. Python can not utilize multi cpu cores at the same time if the you use pure Python to write your code. But Python can utilize multi cores at the same time when it calls some functions or packages which are written in C, like Numpy, etc.


I have heard that "multithreading in python is not the real multithreading, because of GIL". And I also heard that "python multithreading is okay to handle IO intensive task instead of computationally intensive task, because there is only one threading running at the same time".

But my experience made me rethink this question. My experience shows that even for computationally intensive task, python multithreading can accelerate the computation nearly learly. (Befor multithreading, it cost me 300 seconds to run the following program, after I use multithreading, it cost me 100 seconds.)

The following figures shows that 5 threads were created by python with CPython as the compiler with package threading and all the cpu cores are nearly 100% percentage.

I think the screenshots can prove that the 5 cpu cores are running at the same time.

So can anyone give me the explanation? Can I apply multithreading for computationally intensive task in python? Or can multi threads/cores run at the same time in python?

My code:


import threading
import time
import numpy as np
from scipy import interpolate


number_list = list(range(10))

def image_interpolation():
    while True:
        number = None
        with threading.Lock():
            if len(number_list):
                number = number_list.pop()
        if number is not None:
            # Make a fake image - you can use yours.
            image = np.ones((20000, 20000))
            # Make your orig array (skipping the extra dimensions).
            orig = np.random.rand(12800, 16000)
            # Make its coordinates; x is horizontal.
            x = np.linspace(0, image.shape[1], orig.shape[1])
            y = np.linspace(0, image.shape[0], orig.shape[0])
            # Make the interpolator function.
            f = interpolate.interp2d(x, y, orig, kind='linear')

        else:
            return 1

workers=5
thd_list = []
t1 = time.time()

for i in range(workers):
    thd = threading.Thread(target=image_interpolation)
    thd.start()
    thd_list.append(thd)

for thd in thd_list:
    thd.join()

t2 = time.time()
print("total time cost with multithreading: " + str(t2-t1))
number_list = list(range(10))

for i in range(10):
    image_interpolation()

t3 = time.time()

print("total time cost without multithreading: " + str(t3-t2))

output is:

total time cost with multithreading: 112.71922039985657
total time cost without multithreading: 328.45561170578003

screenshot of top during multithreading top

screenshot of top -H during multithreading top -H

screenshot of top then press 1. during multithreading top press 1

screenshot of top -H without multithreading top


Solution

  • As you mentioned, Python has a "global interpreter lock" (GIL) that prevents two threads of Python code running at the same time. The reason that multi-threading can speed up IO bound tasks is that Python releases the GIL when, for example, listening on a network socket or waiting for a disk read. So the GIL does not prevent two lots of work being done simultaneously by your computer, it prevents two Python threads in the same Python process being run simultaneously.

    In your example, you use numpy and scipy. These are largely written in C and utilise libraries (BLAS, LAPACK, etc) written in C/Fortran/Assembly. When you perform operations on numpy arrays, it is akin to listening on socket in that the GIL is released. When the GIL is released and the numpy array operations called, numpy gets to decide how to perform the work. If it wants, it can spawn other threads or processes and the BLAS subroutines it calls might spawn other threads. Precisely if/how this is done can be configured at build time if you want to compile numpy from source.

    So, to summarise, you've have found an exception to the rule. If you were to repeat the experiment using only pure Python functions, you would get quite different results (e.g. see the "Comparison" section of the page linked to above).