How can I check if a thread holds the GIL with sub-interpreters?

I am working on some changes to a library which embeds Python which require me to utilize sub-interpreters in order to support resetting the python state, while avoiding calling Py_Finalize (since calling Py_Initialize afterwards is a no-no).

I am only somewhat familiar with the library, but I am increasingly discovering places where PyGILState_Ensure and other PyGILState_* functions are being used to acquire the GIL in response to some external callback. Some of these callbacks originate from outside Python, so our thread certainly doesn't hold the GIL, but sometimes the callback originates from within Python, so we definitely hold the GIL.

After switching to sub-interpreters, I almost immediately saw a deadlock on a line calling PyGILState_Ensure, since it called PyEval_RestoreThread even though it was clearly already being executed from within Python (and so the GIL was held):

For what it's worth, I have verified that a line that calls PyEval_RestoreThread does get executed before this call to PyGILState_Ensure (it's well before the first call into Python in the above picture).

I am using Python 3.8.2. Clearly, the documentation wasn't lying when it says:

Note that the PyGILState_* functions assume there is only one global interpreter (created automatically by Py_Initialize()). Python supports the creation of additional interpreters (using Py_NewInterpreter()), but mixing multiple interpreters and the PyGILState_* API is unsupported.

It is quite a lot of work to refactor the library so that it tracks internally if the GIL is held or not, and seems rather silly. There should be a way to determine if the GIL is held! However, the only function I can find is PyGILState_Check, but that's a member of the forbidden PyGILState API. I'm not sure it'll work. Is there a canonical way to do this with sub-interpreters?

Solution

I've been pondering this line in the documentation:

Also note that combining this functionality with PyGILState_* APIs is delicate, because these APIs assume a bijection between Python thread states and OS-level threads, an assumption broken by the presence of sub-interpreters.

I suspect that the issue was that there's something involving thread local storage on the PyGILState_* API.

I've come to think that it's actually not really possible to tell if the GIL is held by the application. There's no central static place where Python stores that the GIL is held, because it's either held by "you" (in your external code) or by the Python code. It's always held by someone. So the question of "is the GIL held" isn't the question the PyGILState API is asking. It's asking "does this thread hold the GIL", which makes it easier to have multiple non-Python threads interacting with the interpreter.

I overcame this issue by restoring the bijection as best I could by creating a separate thread per sub-interpreter, with the order of operations being very strictly as follows:

Grab the GIL in the main thread, either explicitly or with Py_Initialize (if this is the first time). Be very careful, the thread state from Py_Initialize must only ever be used in the main thread. Don't restore it to another thread: Some module might use the PyGILState_* API and the deadlock will happen again.
Create the thread. I just used std::thread.
Spawn the subinterpreter with Py_NewInterpreter. Be very careful, this will give you a new thread state. As with the main thread state, this thread state must only be used from this thread.
Release the GIL in the new thread when you're ready for Python to do its thing.

Now, there's some gotchas I discovered:

asyncio in Python 3.8-3.9 has a use-after-free bug where the first interpreter loading it manages some resources. So if that interpreter is ended (releasing those resources) and a new interpreter grabs asyncio, there will be a segfault. I overcame this by manually loading asyncio through the C API in the main interpreter, since that one lives forever.
Many libraries, including numpy, lxml, and several networking libraries will have trouble with multiple subinterpreters. I believe that Python itself is enforcing this: An ImportError results when importing any of these libraries with: Interpreter change detected - This module can only be loaded into one interpreter per process. This so far seems to be an insurmountable issue for me since I do require numpy in my application.