python python-3.x multithreading cpython gil

Do we ever need to synchronise threads in python?

According to GIL wiki it states that

In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython's memory management is not thread-safe.

When multiple threads tries to do some operation on a shared variable at same time we need to synchronise the threads to avoid Race Conditions. We achieve this by acquiring a lock.

But since python uses GIL only one thread is allowed to execute python's byte code, so this problem should be never faced in case of python programs - is what I thought :( .But I saw an article about thread synchronisation in python where we have a code snippet that is causing race conditions. https://www.geeksforgeeks.org/multithreading-in-python-set-2-synchronization/

Can someone please explain me how this is possible?

Code

import threading

# global variable x
x = 0

def increment():
    """
    function to increment global variable x
    """
    global x
    x += 1

def thread_task():
    """
    task for thread
    calls increment function 100000 times.
    """
    for _ in range(100000):
        increment()

def main_task():
    global x
    # setting global variable x as 0
    x = 0

    # creating threads
    t1 = threading.Thread(target=thread_task)
    t2 = threading.Thread(target=thread_task)

    # start threads
    t1.start()
    t2.start()

    # wait until threads finish their job
    t1.join()
    t2.join()

if __name__ == "__main__":
    for i in range(10):
        main_task()
        print("Iteration {0}: x = {1}".format(i,x))

Output:

Iteration 0: x = 175005
Iteration 1: x = 200000
Iteration 2: x = 200000
Iteration 3: x = 169432
Iteration 4: x = 153316
Iteration 5: x = 200000
Iteration 6: x = 167322
Iteration 7: x = 200000
Iteration 8: x = 169917
Iteration 9: x = 153589

Solution

Only one thread at a time can execute bytecode. That ensures memory allocation, and primitive objects like lists, dicts and sets are always consistent without the need for any explicit control on the Python side of the code.

(update: even with the upcoming free-threading variants in Python 3.13+, these native data containers (lists, dicts and sets) are still thread-safe. From the PoV of Python code, no considerations regarding thread-safety change, for better or worse)

However, the += 1, integers being imutable objects, is not atomic: it fetches the previous value in the same variable, creates (or gets a reference to) a new object, which is the result of the operation, and then stores that value in the original global variable. The bytecode for that can be seen with the help of the dis module:


In [2]: import dis

In [3]: global counter

In [4]: counter = 0

In [5]: def inc():
   ...:     global counter
   ...:     counter += 1
   ...: 

In [6]: dis.dis(inc)
  1           0 RESUME                   0

  3           2 LOAD_GLOBAL              0 (counter)
             14 LOAD_CONST               1 (1)
             16 BINARY_OP               13 (+=)
             20 STORE_GLOBAL             0 (counter)
             22 LOAD_CONST               0 (None)
             24 RETURN_VALUE

And the running thread can change arbitrarily between each of these bytecode instructions.

So, for this kind of concurrency, one has to resort to, as in lower level code, to a lock -the inc function should be like this:

  In [7]: from threading import Lock

In [8]: inc_lock = Lock()

In [9]: def inc():
   ...:     global counter
   ...:     with inc_lock:
   ...:         counter += 1
   ...:

So, this will ensure no other thread will run bytecode while performing the whole counter += 1 part.

(The disassemble here would be significantly lengthier, but it has to do with the semantics of the with block, not with the lock, so, not related to the problem we are looking at. The lock can be acquired through other means as well - a with block is just the most convenient.)

Also, this is one of the greatest advantages of async code when compared to threaded parallelism: in async code one's Python code will always run without being interrupted unless there is an explicit deferring of the flow to the controlling loop - by using an await or one of the various async <command> patterns.