python multithreading python-multithreading

When to use thread local memory in Python?

I am just starting with Python and stumbled upon the thread local memory. I wrote a small program that uses threads:

#!/usr/bin/env python3

import logging
import signal
import threading
import time

class WorkerThread(threading.Thread):
    def __init__(self, idx):
        threading.Thread.__init__(self)
        self.thread_index = idx
        self.thread_alive = True

    def run(self):
        logging.info(f'Thread {self.thread_index} is starting up!')

        while self.thread_alive:
            logging.info(f'Thread {self.thread_index} is still running.')
            time.sleep(1)

        logging.info(f'Thread {self.thread_index} is stopping!')

    def kill(self):
        self.thread_alive = False

def main():
    logging.basicConfig(format = '%(levelname)s: %(message)s', level = logging.INFO)

    def signal_handler(sig, frame):
        logging.info('Ctrl+c pressed, killing threads and shutting down ...')
        nonlocal threads
        for thread in threads:
            thread.kill()

    signal.signal(signal.SIGINT, signal_handler)

    logging.info('Signal handler registered, starting threads ...')

    threads = []
    for i in range(0, 3):
        thread = WorkerThread(i)
        threads.append(thread)
        thread.start()

    for thread in threads:
        thread.join()

    signal.signal(signal.SIGINT, signal.SIG_DFL)

if __name__ == '__main__':
    main()

This program works as expected and prints something like:

> python3 main.py
INFO: Signal handler registered, starting threads ...
INFO: Thread 0 is starting up!
INFO: Thread 0 is still running.
INFO: Thread 1 is starting up!
INFO: Thread 2 is starting up!
INFO: Thread 1 is still running.
INFO: Thread 2 is still running.
INFO: Thread 0 is still running.
INFO: Thread 1 is still running.
INFO: Thread 2 is still running.
INFO: Thread 0 is still running.
INFO: Thread 2 is still running.
INFO: Thread 1 is still running.
INFO: Thread 2 is still running.
INFO: Thread 1 is still running.
INFO: Thread 0 is still running.
INFO: Thread 1 is still running.
INFO: Thread 2 is still running.
INFO: Thread 0 is still running.
^CINFO: Ctrl+c pressed, killing threads and shutting down ...
INFO: Thread 2 is stopping!
INFO: Thread 1 is stopping!
INFO: Thread 0 is stopping!

In this case the thread_index and thread_alive variables are specific for each thread as they are specific for each object. But there is also the threading.local() function that creates thread local memory. So I tried to use this, as I want my variables to be thread specific. I used it after I defined the class:

# imports and shebang

class WorkerThread(threading.Thread):
    thread_index = threading.local()
    thread_alive = threading.local()

# everything else stays the same

But using this does not change anything, the output stays the same. So my questions are:

is this thread local memory for another use case or did the first program only work by accident?
what are use cases for threading.local(), as creating object specific (non static) variables seem to work too?

Solution

threading.local() is for cases when you cannot or don't want to modify classes that implement threads.

In the above example you are in full control as you've created WorkerThread and you have started threads. So you know that you have an instance per running thread and you can store values in the instance that is bound to a thread. That's why your initial example worked. It works correctly in this regard.

But it is not always the case that you control threads. Sometimes threads are started by the library or framework and you only provide some code that will be run in these threads. In that case you cannot modify Thread classes and add thread specific variables to them.

Let's take an example of a multithreaded web server. You provide functions that are supposed to process incoming requests. You do not create all the infrastructure to listen on the socket, parse http request etc. All these activities are handled by the framework. It starts a pool of threads for you and when there's incoming request the framework parses it and invokes the handler you've provided using a thread from the pool.

In this case let's imagine you want to store some context for the request that is being processed (for example the currently logged in user) so that you can access it during request processing but do not need to pass it around in every function explicitly. You can't add this currentUser variable to a thread class as you don't have control over it. But you can use threading.local() to store it. And requests that are concurrently processed in multiple threads will have their own copies of that.

The same is applicable for your own creations. When the program becomes more complex and you need to separate infrastructure code (managing threads) from the logic of your application it may happen that you do not want to add a thread specific variables to thread classes and use threading.local() instead.