Search code examples
pythonmultithreadingpython-multithreading

Python multithreading program is giving unexpected output


I know that there is no guarantee regarding the order of execution for the threads. But my doubt is when I ran below code,

import threading

def doSomething():
    print("Hello ")

d = threading.Thread(target=doSomething, args=())
d.start()
print("done")

Output that is coming is either

Hello done

or this

Hello 
done

May be if I try too much then it might give me below as well

done
Hello

But I am not convinced with the first output. Since order can be different but how come both outputs are available in the same line. Does that means that one thread is messing up with other threads working?


Solution

  • This is a classic race condition. I can't personally reproduce it, and it would likely vary by interpreter implementation and the precise configuration applied to stdout. On Python interpreters without a GIL, there is basically no protection against races, and this behavior is expected to a certain extent. Python interpreters do tend to try to protect you from egregious data corruption due to threading, unlike C/C++, but even if they ensure every byte written ends up actually printed, they usually wouldn't try to make explicit guarantees against interleaving; Hdelolnoe would be a possible (if fairly unlikely given likely implementations) output when you're making no effort whatsoever to synchronize access to stdout.

    On CPython, the GIL protects you more, and writing a single string to stdout is more likely to be atomic, but you're not writing a single string. Essentially, the implementation of print is to write objects one by one to the output file object as it goes, it doesn't batch up to a single string then call write just once. What this means is that:

    print("Hello ")  # Implicitly outputs default end argument of '\n' after printing provided args
    

    is roughly equivalent to:

    sys.stdout.write("Hello ")
    sys.stdout.write("\n")
    

    If the underlying stack of file objects that implements sys.stdout decides to engage in real I/O in response to the first write, they'll release the GIL before performing the actual write, allowing the main thread to catch up and potentially grab the GIL before the worker thread is given a chance to write the newline. The main thread then outputs the done and then the newlines from each print come out in some unspecified (and irrelevant) order based on further potential races.

    Assuming you're on CPython, you could probably fix this by changing the code to this equivalent code using single write calls:

    import threading
    import sys
    
    def doSomething():
        sys.stdout.write("Hello \n")
    
    d = threading.Thread(target=doSomething)  # If it takes no arguments, no need to pass args
    d.start()
    sys.stdout.write("done\n")
    

    and you'd be back to a race condition that only swaps the order, without interleaving (the language spec wouldn't guarantee a thing, but most reasonable implementations would be atomic for this case). If you want it to work with any guarantees without relying on the quirks of the implementation, you have to synchronize:

    import threading
    
    lck = threading.Lock()
    
    def doSomething():
        with lck:
            print("Hello ")
    
    d = threading.Thread(target=doSomething)
    d.start()
    with lck:
        print("done")