How to manage python callbacks executed from a C library in a Qt/PySide6 application?

Before I describe my particular problem, please check my understanding of how Python works with DLLs in general.

I have an existing PySide6 application. When I call QCoreApplication::exec(), a Qt event loop starts running. Eventually, a signal gets emitted, and the event loop executes a slot that I define in Python.

The way I imagine this, once we call exec(), the python interpreter basically doesn't do anything until a slot defined in Python needs to execute. So, the Qt event loop, which "lives" in the QtCore dll, starts iterating, and the interpreter is still alive, but is blocked from doing anything. If I look at the system in Process Explorer, I can see a Python process containing several threads. One of them has a start address of Qt6Core.dll!QThread::start, a few are ucrtbase.dll, several are ntdll.dll, and one is python.exe.

If no signals are being emitted and no slots are being executed, my understanding is that the python interpreter can't do anything at all. It can only run if Qt tells it to, even to do background things like collect garbage or service OS signals.

Question 1: Is all of that correct?

At some point, a signal (Qt signal, not OS signal) will be emitted, and a slot will need to execute. When a slot is defined in Python, the Qt event loop will need to somehow execute that slot's Python code, and that code will need to have access to all the variables and things in the interpreter's environment.

Question 2: When that slot gets executed, what is happening to the Qt and Python threads? I imagine that the Qt event loop's thread tells the interpreter's thread to run some chunk of code, and the Qt thread just needs to sit and wait for the interpreter to finish executing it. Is that what's happening? Or does Qt's thread actually execute the interpreter's code somehow? Can each one block the other from executing?

My application needs a new feature now, and we need to use another library to talk to some hardware. There is a Python wrapper around the hardware's C library, and this wrapper lets us specify callbacks in python to be executed when the hardware detects an event. When I run this, another thread shows up in the list in Process Explorer, so the hardware's DLL must kick off another thread to poll for events. When something happens in that thread, it executes the Python callback, in which I print out the thread's ID (listed in Process Explorer as the hardware library's thread) and emit a Qt signal. When a slot connected to that signal executes, the thread ID printed is the one listed in Process Explorer as python.exe's thread.

Question 3: Both the slot and the callback are Python functions. Why does one execute in the python interpreter's thread while the other executes in the hardware DLL's thread? I would expect that both of them would execute in the interpreter's thread, or they'd both execute in the threads of their respective DLLs (Qt and the hardware library). Are there different ways for DLLs to "call" the Python interpreter? And could this just be an artifact of how the logger module's %(thread)d attribute works?

So the question that actually brought me here:

Both the hardware callback and the Qt slot can manipulate variables in the Python environment, so I have to assume that I need to use some thread synchronization mechanism. Can I use Python's threading.Lock object (note that in this application I'm not currently using the threading module at all)? Would I need to use Qt's mutex objects? Or does it not matter at all?

One thing I've considered is just not putting the hardware interaction into the application at all. Instead, I could have all hardware operations in a separate python application from the PySide GUI. Then they could talk to each other via TCP or a named pipe, and it would avoid all of this. However, there's obviously a lot I don't understand about how all these threads are handled, so I'd like to learn a little first and be able to make a more informed decision.

Solution

the Cpython interpreter is not "living" in a different thread, it is just a dll, that has C functions that can be called by any application, this dll also has a few global structs which hold information like the loaded modules and the variables in them, these structs are accessible by any thread in the application through the dll functions.

If no signals are being emitted and no slots are being executed, my understanding is that the python interpreter can't do anything at all. It can only run if Qt tells it to, even to do background things like collect garbage or service OS signals.

you are mostly correct, checking OS signals and garbage collection are done by the interpreter on each step in the execution of the python bytecode, if python bytecode isn't executing (as in you are stuck in a C library or blocked by a lock) then you won't be able to do them.

When that slot gets executed, what is happening to the Qt and Python threads? I imagine that the Qt event loop's thread tells the interpreter's thread to run some chunk of code, and the Qt thread just needs to sit and wait for the interpreter to finish executing it. Is that what's happening?

No, it's all just a single thread, launching python.exe creates a single thread, this same thread calls the C++ function that runs the QT eventloop, and when the slot is fired this same thread calls the C function that will execute the python bytecode, think about the interpreter as just a fancy C function that parses strings and calls other C functions, and QT as another fancy C function, and they just end up calling one another.

Both the slot and the callback are Python functions. Why does one execute in the python interpreter's thread while the other executes in the hardware DLL's thread?

because in the slot case, it was the python.exe thread that called the C function to execute python bytecode, while in the other case it was the dll thread that called another function in the python dll (a wrapper) to execute python bytecode, a C function (Cpython) can be executed by any thread in the process.

Both the hardware callback and the Qt slot can manipulate variables in the Python environment, so I have to assume that I need to use some thread synchronization mechanism.

yes, there is already one, it's called the Global Interpreter Lock (GIL), it is a lock that all the functions that are executing python bytecode have to lock before they start executing python bytecode and have to release that lock when they are no-longer executing python bytecode, this is automatically done by QT, ctypes, and other binding APIs like pybind11, but if you are calling into the python dll manually then you will have to make sure you are Acquiring and Releasing the GIL before and after executing python bytecode, as the python interpreter will fail if the GIL wasn't locked and you may run into deadlocks if you don't properly release that lock.