Search code examples
pythonc++cpythonpython-3.10

Embedded Python (3.10) - Py_FinalizeEx hangs/deadlock on "threading._shutdown()"


I am embedding Python in a C++ application, and I think I have some confusion with PyGILState_Ensure/PyGILState_Release which leads eventually to Py_FinalizeEx to hang in threading._shutdown() (called by Py_FinalizeEx) while join()-ing the threads.

During initialization I am calling:

Py_InitializeEx(0); // Skip installing signal handlers
auto gil = PyGILState_Ensure();
// ... running Python code
PyGILState_Release(gil);

Whenever a thread uses Python (can be multiple C-threads), I am using pyscope at the beginning of the function:

#define pyscope() \
    PyGILState_STATE gstate = PyGILState_Ensure(); \
    utils::scope_guard sggstate([&]() \
    { \
        PyGILState_Release(gstate); \
    });

When I want to free python I am calling (from a C-Thread, not necessarily one who initialized Python):

PyGILState_STATE gstate = PyGILState_Ensure();
int res = Py_FinalizeEx(); // <--- HANGS!!!

Debugging and reading the code revealed it hangs during joining of threads. I can reproduce the deadlock by running the following code with PyRun_SimpleString (running it right before Py_FinalizeEx):

import threading
    for t in threading.enumerate():
        print('get_ident: {} ; native: {}'.format(t.ident, t.native_id))
        if not threading.current_thread().ident == t.ident:
            t.join()

Last, I am not using PyEval_SaveThread/RestoreThread, maybe I have to, but I don't understand how to use them with GIL_Ensure/Release as I saw they are internally also taking and dropping the GIL.

Any ideas why the deadlock occurs and how to resolve this issue? Thanks!


Solution

  • that's because each time you acquire the gil you must release it, the main thread starts by acquiring the GIL once so you must reset the initial GIL count in the main thread by calling PyEval_SaveThread in the main thread.

    Py_InitializeEx(0); // Skip installing signal handlers
    PyThreadState* tstate = PyEval_SaveThread();
    
    auto gil = PyGILState_Ensure();
    // ... running Python code
    PyGILState_Release(gil);
    

    Minimal Reproducible Example

    #include <iostream>
    #include <Python.h>
    #include <string>
    #include <windows.h>
    #include <processenv.h>
    #include <thread>
    
    namespace util {
        template<class F>
        class scope_guard {
            F func_;
            bool active_;
        public:
            scope_guard(F func) : func_(std::move(func)), active_(true) { }
            ~scope_guard() {
                if (active_) func_();
            }
        };
    
    } // namespace util
    
    #define pyscope() \
        PyGILState_STATE gstate = PyGILState_Ensure(); \
        util::scope_guard sggstate([&]() \
        { \
            PyGILState_Release(gstate); \
        })
    
    void worker()
    {
        PyGILState_STATE gstate = PyGILState_Ensure();
    
        PyRun_SimpleString("from time import time,ctime\n"
            "print('Today is', ctime(time()))\n");
        if (Py_FinalizeEx() < 0) {
            exit(120);
        }
    }
    void worker2()
    {
        pyscope();
        PyRun_SimpleString("from time import time,ctime\n"
            "print('Today is', ctime(time()))\n");
    }
    
    int main(int argc, char *argv[])
    {
    
        SetEnvironmentVariableA("PYTHONHOME", "C:\\Users\\blabla\\.conda\\envs\\py310");
        wchar_t *program = Py_DecodeLocale(argv[0], NULL);
        if (program == NULL) {
            fprintf(stderr, "Fatal error: cannot decode argv[0]\n");
            exit(1);
        }
        Py_SetProgramName(program);
        Py_Initialize();
        PyThreadState* tstate = PyEval_SaveThread();
    
        auto thread2 = std::thread(worker2);
        thread2.join();
    
        auto GIL = PyGILState_Ensure();
        auto thread1 = std::thread(worker);
        PyGILState_Release(GIL);
        thread1.join();
    
        PyMem_RawFree(program);
        return 0;
    }