Using all CPU cores in Python C API

Is it possible in latest Python's C API to anyhow use all CPU cores?

Because of GIL Python can use only one CPU core at a time, thus performance is low on multi-core machine.

But C API has not-well-documented possibility to have several Interpreters within one C++ program.

Is it possible anyhow by incorporating several interpreters, even one interpreter per each C++ thread, to have separate GIL within each thread/interpreter and thus allow to run every C++ thread using a separate core hence using all 100% CPU performance?

It is said in docs that there is only one single GIL within one program, if I understand correctly, so different interpreters created by Py_NewInterpreter() share same GIL, and all of them can't have separate GIL. It means if I acquire GIL then all other interpreters will be blocked. Maybe I'm wrongly interpreting docs though...

The task is such that inside C++ program in each separate thread I want to execute PyRun_String(...), all threads will not share anything. Each such PyRun_String() is allowed to be run in a separate Interpreter if it will help.

Because all C++ threads don't share anything (hence don't share PyObject * instances) maybe it is possible not to acquire GIL at all? I don't know if global state (global variables) of Python C API need GIL protection or not? Maybe only PyObject * instance need to be protected, hence if C++ threads don't share PyObject* then maybe GIL is not needed to be acquired, does anybody know this?

Of cause I know that it is possible to spawn several processes running this C++ program. But right now I want to understand if task (using all 100% CPU cores) is solvable within one single C++ process.

I was also thinking that it maybe possible through next solution: Python C API is linked via python39.lib, it has some global C variables, these global variables hold state of C Interpreter. Maybe it is possible somehow to link library in such a way that all global variables go into some relocatable region, so that later in each C++ thread I create separate memory region with global variable. Thus every thread will have its copy of global variables, resulting in having totally separated Interpreter state in each thread. But I don't know about any way to make global variables relocatable for single given .lib file, do you know of any ways to do that?

Solution

Currently, cpython uses one shared GIL for all interpreters. The GIL needs to be held when running python code to protect internal structures. Because of this, python code can not be concurrently executed, even in separate interpreters.

Python 3.10 will have incomplete support for this ([subinterpreters] Meta issue: per-interpreter GIL), but it needs to be enabled at build time with --experimental-isolated-subinterpreters.