Search code examples
pythonc++python-c-apipybind11

How to call a python function from C++ with pybind11?


Please consider the following C++ pybind11 program:

#include <pybind11/embed.h>

namespace py = pybind11;

int main() {
    py::scoped_interpreter guard{};

    py::dict locals;

    py::exec(R"(

        import sys

        def f():
            print(sys.version)

    )", py::globals(), locals);

    locals["f"]();  // <-- ERROR
}

The py::exec call and the enclosed import sys call both succeed, but the call locals["f"]() throws an exception:

NameError: name 'sys' is not defined

on the first line of function f.

Expected behaviour is that the program prints the python system version.

Any ideas?

Update:

I modified the program as suggested by @DavidW:

#include <pybind11/embed.h>

namespace py = pybind11;

int main() {
    py::scoped_interpreter guard{};

    py::dict globals = py::globals();

    py::exec(R"(

        import sys

        def f():
            print(sys.version)

    )", globals, globals);

    globals["f"]();  // <-- WORKS NOW
}

and it now works.

I'm not 100% sure I understand what is going on, so I would appreciate an explanation.

(In particular does the modification of the common globals / locals dictionary impact any other scripts. Is there some global dictionary that is part of the python interpreter that the exec script is modifying? or does py::globals() take a copy of that state so the execed script is isolated from other scripts?)

Update 2:

So it looks like having globals and locals be the same dictionary is the default state:

$ python
>>> globals() == locals()
True
>>> from __main__ import __dict__ as x
>>> x == globals()
True
>>> x == locals()
True

...and that the default value for the two is __main__.__dict__, whatever that is (__main__.__dict__ is the dictionary returned by py::globals())

I'm still not clear what exactly __main__.__dict__ is.


Solution

  • So the initial problem (solved in the comments) was that having different globals and locals causes it to be evaluated as if it were in a class (see the Python documentation for exec - the PyBind11 function behaves basically the same):

    Remember that at the module level, globals and locals are the same dictionary. If exec gets two separate objects as globals and locals, the code will be executed as if it were embedded in a class definition.

    A function scope doesn't look up variables defined in its enclosing class - this wouldn't work

    class C:
        import sys
        def f():
            print(sys.version)
            # but C.sys.version would work
    

    and thus your code doesn't work.


    pybind11::globals returns a dictionary that's shared in a number of places:

    Return a dictionary representing the global variables in the current execution frame, or __main__.__dict__ if there is no frame (usually when the interpreter is embedded).

    and thus any modifications to this dictionary will be persistent and stay (which probably isn't what you want!). In your case it's probably __main__.__dict__ but in general "the current execution frame" might change from call-to-call, depending on how much you're crossing the C++-Python boundary. For example, if a Python function calls a C++ function that modifies globals() then exactly what you modify depends on the caller.

    My advice would be to create a new, empty dict instead and pass that to exec. This ensures that you run in a fresh, non-shared namespace.


    __main__ is just a special module that represents the "top level code environment". Like any module is has a __dict__. When running in the REPL it's the global scope there. From the pybind11 point of view it's just a module with a dict, and you probably shouldn't be writing into it casually (unless you've really decided that you want to deliberately put something there to share it globally).


    Regarding the __builtins__: the documentation for the Python exec function says

    If the globals dictionary does not contain a value for the key __builtins__, a reference to the dictionary of the built-in module builtins is inserted under that key. That way you can control what builtins are available to the executed code by inserting your own __builtins__ dictionary into globals before passing it to exec().

    and looking at the code for the PyRun_String that Pybind11 exec calls, the same applies there.

    This dictionary seems to be sufficient for the builtin functions to be looked up correctly. (If that isn't the case then you can always do pybind11::dict(pybind11::module::import("builtins").attr("__dict__")) to make a copy of the builtin dict and use that instead. However, I don't believe it's necessary)