Search code examples
pythonpython-multithreadinggil

Does using multiple threads in python really produce overhead(GIL)?


From the python wiki:

However the GIL can degrade performance even when it is not a bottleneck. Summarizing those slides: The system call overhead is significant, especially on multicore hardware. Two threads calling a function may take twice as much time as a single thread calling the function twice. The GIL can cause I/O-bound threads to be scheduled ahead of CPU-bound threads. And it prevents signals from being delivered.

I tried running a simple function first in a single thread and then compare it with using 5 threads:

from threading import Thread
import time


def count(n):
    while n > 0:
        n -= 1


a=time.time()
count(100000000)
count(100000000)
count(100000000)
count(100000000)
count(100000000)

print(time.time()-a)

a = time.time()
t1 = Thread(target=count, args=(100000000,))
t1.start()
t2 = Thread(target=count, args=(100000000,))
t2.start()
t3 = Thread(target=count, args=(100000000,))
t3.start()
t4 = Thread(target=count, args=(100000000,))
t4.start()
t5 = Thread(target=count, args=(100000000,))
t5.start()
t1.join()
t2.join()
t3.join()
t4.join()
t5.join()
print(time.time()-a)

I experimented with separate number of threads and each time the multi-threaded version ran (marginally) faster. I am using python 3.7.3(64bit) on a windows 10 machine(64bit) running on intel i5 4-core processor(8 logical cores).

I am really just starting to learn about threading and it's very frustrating to be stuck right at the beginning. I also found the same inconsistancies in some other articles found by googling which used basically the same example code. I guess my question would be if someone could provide a more apropriate example or link to a more clearly done study.


Solution

  • Does using multiple threads in python really produce overhead(GIL)?

    Yes, and that's consistent with the results of your experiment. It's not clear to me why you suppose that the Python wiki -- a well curated source -- might be lying to you.

    I also found the same inconsistancies in some other articles found by googling which used basically the same example code.

    I don't see what you claim to be inconsistent. You say that your multi-threaded code ran only marginally faster than your single-threaded code, but in the ideal case, a five-threaded computation would run in one fifth the time of a single-threaded version of the same computation. That's not a marginal difference.

    I am really just starting to learn about threading and it's very frustrating to be stuck right at the beginning.

    It's unclear to me why you think you're stuck. You're running a multi-threaded computation successfully, and apparently even seeing a little speedup. But if you mean you want to be able to observe more speedup then Python is not the ideal platform for you. You can see good speedup from multithreading in Python, but it depends heavily on details of the workload.

    Indeed, the issue description you quoted isn't even central to the question. The biggest challenge to getting good speedup from multithreaded Python code is not that, but rather the bottleneck problem to which it refers. The GIL is very much a double-edged sword. It protects you from many of the complications attending multithreaded programming (so you will not learn about those from studying multithreading via Python), but in order to be sufficiently universal, it imposes significant restrictions on the actual concurrency that Python threads can achieve.