Search code examples
c++ccpython

Certain python functions hang from Python/C API


I'm using the Python/C API to call Python functions from C++. My problem is that when I call a Python function that in turn imports scipy.optimize.least_squares then it hangs. Here are the details...

I'm calling my Python function testfunc1(foo,bar=True) in module test_clib.py as follows:

PyObject* pTestModuleName = PyUnicode_FromString( "test_clib" );
PyObject* pTestModule     = PyImport_Import( pTestModuleName );
PyObject* pFunction       = PyObject_GetAttrString( pTestModule, "testfunc1" );

const char* str = "foo";

PyObject* pArgList = Py_BuildValue( "(s)", str );

PyObject* pKeywords = PyDict_New();
PyDict_SetItemString( pKeywords, "bar", Py_True );

PyObject* pReturn = PyObject_Call( pFunction, pArgList, pKeywords );

When testfunc1() imports scipy.optimize.least_squares then it will hang. It doesn't even have to call least_squares. It will hang on this line:

from scipy.optimize import least_squares

But, when I boil it down to just a simple test program like I've shown here, it works. Where it fails is when the above snippet is part of my larger program.

So I realize this is not going to be something that someone else can directly try but maybe someone can spot something that I'm missing.

Maybe this will be helpful: when I run the test program in gdb it prints that about a dozen threads are started but my simple test program has no threads, so all of those must be from the C/Python API. When I try to run gdb on my larger program which makes the same Python calls, I don't see all those threads starting, it just hangs; when I interrupt it, it breaks here:

 ^C
 Thread 18 "acamd" received signal SIGINT, Interrupt.0x00007ffff73ca7e8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
 (gdb) where
 #0  0x00007ffff73ca7e8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib6/libpthread.so.0
 #1  0x00007ffff793bef5 in take_gil () from /lib64/libpython3.9.so.1.0    
 #2  0x00007ffff793c112 in PyEval_RestoreThread () from /lib64/libpython3.9.so.1.0
 #3  0x00007ffff7a5dd8c in PyGILState_Ensure () from /lib64/libpython3.9.so.1.0
 #4  0x00007fffb83bd6c5 in pybind11::detail::get_internals() () from /usr/local/lib64/python3.9/site-packages/scipy/spatial/_distance_pybind.cpython-39-x86_64-linux-gnu.so
 #5  0x00007fffb83ae22c in PyInit__distance_pybind () from /usr/local/lib64/python3.9/site-packages/scipy/spatial/_distance_pybind.cpython-39-x86_64-linux-gnu.so
 #6  0x00007ffff7a765bc in _imp_create_dynamic () from /lib64/libpython3.9.so.1.0
 #7  0x00007ffff7989ac8 in cfunction_vectorcall_FASTCALL () from /lib6/libpython3.9.so.1.0
 #8  0x00007ffff799a8eb in PyObject_Call () from /lib64/libpython3.9.so.1.0
 #9  0x00007ffff7a089d8 in _PyEval_EvalFrameDefault () from /lib64/libpython3.9.so.1.0
 #10 0x00007ffff79f084e in _PyEval_EvalCode () from /lib64/libpython3.9.so.1.0
 #11 0x00007ffff 79f1e2b in _PyFunction_Vectorcall () from /lib64/libpython3.9.so.1.0
    :   :   :

which continues on for some 200 more lines.


Solution

  • the traceback (specifically PyEval_RestoreThread) indicates that the thread is stuck trying to reclaim the GIL (global interpreter lock).

    things that can lead up to this point.

    1. the GIL is held by another c++ thread and not releasing it
    2. you have a mismatch in the number of times you have acquired and released the GIL in another thread