Search code examples
c++c++11pybind11

Any C++ macro that can export all member variables of a struct for pybind11


I have a simple structure like:

struct Config {
  bool option1;
  bool option2;
  int arg1;
};

Using pybind11, I have to export the member variables like:

py::class_<Config>(m, "Config")
    .def_readwrite("option1", &Config::option1)
    .def_readwrite("option2", &Config::option2)
    .def_readwrite("arg1", &Config::arg1);

It is ok to write the above when these structs are a few. But it becomes tedious when I have a large number of simple structures.

Is there a convenience macro that I can write like:

PYBIND_EXPORT_STRUCT(Config1);
PYBIND_EXPORT_STRUCT(Config2);
...

and each scans and exports all the given struct's member variables?

Will it be helpful if I already write the structs in this form:

struct Config {
    ADD_PROPERTY(bool, option1);
    ADD_PROPERTY(bool, option2);
    ADD_PROPERTY(int, arg1);
};

My question involves two parts:

  1. To reflect a member variable back to its name string.
  2. To iterate through struct members.

I am aware of introspection to solve the first part, using typeid(arg1).name() to retrieve the name string.

For the second part, C++ does not directly support. However, I am trying to figure it out through some answers here.

The rest of the question is how to fuse the above two parts to get a working implementation for my imagined PYBIND_EXPORT_STRUCT() function.

That said, I don't mind expressing my structs in a totally different representation (like using macros, or as tuples). Any will do as long as I don't have to enumerate my struct members again when exporting them with pybind11, and I can still use the variables like config1.option1=true in C++ code.


Solution

  • 1. How not to solve the problem

    Neither of the approaches that you thinking of are viable nor practical.

    I am aware of introspection to solve the first part, using typeid(arg1).name() to retrieve the name string.

    This is incorrect. C++ has RTTI, run-time type information, but it's very far from “reflection” which C#, Java or Python have. In particular, the member function std::type_info::name() “Returns an implementation defined null-terminated character string containing the name of the type. No guarantees are given; in particular, the returned string can be identical for several types and change between invocations of the same program.” [highlights are mine -kkm] In fact, this program

    #include <iostream>
    #include <typeinfo>
    struct Config { int option; };
    int main() { std::cout << typeid(&Config::option).name() << "\n"; }
    

    prints, if compiled with GCC 11 on Linux x64,

    M6Configi
    

    which is fully standard-compliant. Here goes down the drain your part #1. A type does not contain a member name, and it's called Runtime Type Information and not Runtime Name Information for a reason. You can even make an educated guess and decode the printed string: M = pointer to member, 6 = next 6 characters name the struct type, Config = obvious, i = int. A pointer to member of type Config, of type int itself. But another compiler will encode (“mangle”, how it's called) the type differently.

    Regarding part #2, take this CppCon video presentation (from an answer you are linking to) for what it really is: it's a demonstration that C++14 metaprogramming is powerful enough to extract information about a POD type. As you see, the presenter declares two functions per each member type that you can possibly encounter (int, volatile int, const int, const volatile int, short, ...). Let's just stop here. All these types are different. In fact, when I changed the declaration of the lone structure member to volatile int option; in the above little test program, it printed a different mangled type name: M6ConfigVi.

    The CppCon presentation is a demonstration of what the machinery is capable of, not what it should be used for. To make an analogy, this is what a barrel roll of an aircraft at an airshow is to routine passenger airline operations. I would avoid barrel rolls in production code if I were you...

    In practice, this is a good test for a compiler. I used to get compiler crashes with far more modest metaprogramming constructs. Besides, you'll probably won't like compilation time of all this kaboodle. Don't be surprised to sit and wait for 10 minutes until the compiler completes the compilation, in one of four ways: crash; internal error report; successful generation of incorrect code; or, fingers crossed, successful generation of correct code. Also, you need a deep, and I mean really, really deep understanding of metaprogramming, how compiler selects different template overloads, what an unevaluated context and SFINAE are, and so on. Simply speaking, just don't. It may work, but is not worth the large amount of "framework" code that is needed to get the conference demo working and an uncertainty of a compiler correctness w.r.t. such a tremendously complex metaprogram.

    2. How to solve the problem

    There is a very traditional way to do what you are trying to do, relying on the plain old C preprocessor macros. The core idea is like this: you write the definitions of your structs as function-like preprocessor macros in a separate file, which doesn't contain the definitions for these macros (let's call it an "abstract definitions file," or ADF, for the want of an accepted term). The second file, your normal header that you include to get concrete declaration of your structures, defines these special macros to expand into normal C++ constructs, then includes the ADF, then (important!) #undefines them. The third file, that creates Python bindings, first includes the header file, then defines the same macros but differently (this is why #undefs were important!), this time in such a way that they expand to pybind11 syntactic constructs; then includes the ADF the second time in the same compilation unit. Let's now put the whole thing together.

    The first file is the ADF, structs.absdef. I would not give it the traditional .h extension to prevent confusing it with "normal" header files. The extension can be anything you want, but choosing one unique within the project is helpful as a signal to the code reader that this is not a "normal" include file.

    /* structs.absdef -- abstract definition of data structures */
    
    #ifndef BEGIN_STRUCT_DEF
    #error "This file should be included only from structs.h or pybind.cc"
    #endif
    
    BEGIN_STRUCT_DEF(Config)
      STRUCT_MEMBER(Config, bool, option1)
      STRUCT_MEMBER(Config, bool, option2)
      STRUCT_MEMBER(Config, int, arg1)
    END_STRUCT_DEF()
    
    /* ... and then structs, structs and more structs ... */
    

    The #ifndef/#error/#endif is just to stop compilation immediately if the preprocessor macros aren't defined before including the file; otherwise, you'll get a buttload of compile errors, more likely misleading than helpful for diagnosing the problem.

    This file will be included into the second file, which is your normal C++ header which defines all the structs in C++ syntax. This is the file that you include as a normal, plain and boring C++ header into your C++ source and/or other include files, where you want the declaration of these structs structs be visible.

    /* structs.h -- C++ concrete definitions of data structures */
    
    #ifndef MYPROJECT_STRUCTS__H
    #define MYPROJECT_STRUCTS__H
    
    #define BEGIN_STRUCT_DEF(stype)            struct stype {
    #define STRUCT_MEMBER(stype, mtype, name)    mtype name;
    #define END_STRUCT_DEF()                   };
    
    #include "structs.absdef"
    
    #undef BEGIN_STRUCT_DEF
    #undef STRUCT_MEMBER
    #undef END_STRUCT_DEF
    
    #endif  // MYPROJECT_STRUCTS__H
    

    One thing to note here is that this file has include guards but the ADT doesn't. This is so because it is included twice in the compilation unit with the pybind calls. This C++ file is special: it transforms the same ADT definitions into pybind syntax. I have no idea how pybind works; I'm blindly copying your example.

    /* pybind.cc -- Generate pybind11 Python bindings */
    
    #include "pybind11.h" // All these #include   ...
    #include "other.h"    // ... directives stand ...
    #include "stuff.h"    // ... for the real McCoy.
    
    #include "structs.h"  /* You need "normal" C++ definitions, too! */
    
    // We rely here on the ADF having had #undef'd its definition of these.
    // The preprocessor does not allow silently redefining macros.
    #define BEGIN_STRUCT_DEF(stype)            py::class_<stype>(m, #stype)
    #define STRUCT_MEMBER(stype, mtype, name)   .def_readwrite(#name, &stype::name)
    #define END_STRUCT_DEF()                   ;
    
    void create_pybind_bindings() {
      // The ADF is included the second time in the CU.
      #include "structs.absdef"
    }
    
    // Not necessary, but customary to avoid polluting the preprocessor
    // namespace, unless the C++ source ends right here.
    #undef BEGIN_STRUCT_DEF
    #undef STRUCT_MEMBER
    #undef END_STRUCT_DEF
    

    Two points that you should pay attention to.

    First, there is no space between function-like macro and the opening parenthesis:

    // Correct:
    #define FOO(x) ((x) + 42)
    // In this statement:
    int j = FOO(1);
    // `FOO(1)' expands by replacing `x' with `1' into:
    int j = ((1) + 42);
    
    // Incorrect:
    //         v--- A feral space attacks!!! Everyone seek shelter!!!
    #define BAR (x) ((x) + 42)
    // Since BAR is not a function-like macro, it expands literally
    // as defined into `(x) ((x) + 42)', such that this:
    int j = BAR(1);
    // expands into:
    int j = (x) ((x) + 42)(1);
    

    i.e. BAR is substituted literally and exactly where it appears. What your compiler will have to say when it tries to digest the result is a load of garbage errors, and certainly not "error: you inserted a space between BAR and (", so be careful.

    Second point is the use of the preprocessor's stringizing operator #, which expands a function-like macro argument that follows it into a double-quoted string: #sname turns into "Config", in quotes, which is just what you need to pass to the pybind API.

    3. Bonus: a peek under the hood

    Obviously, we don't have the files "pybind11.h", "other.h" and "stuff.h": they are just placeholder names, so I'll simply create empty ones. The 3 other files I have literally copied from this answer. When you compile pybind.cc, the C preprocessor is first invoked by the compiler driver. We'll invoke it alone and examine its output. The c++ -E <filename.cc> command tells the compiler to call the preprocessor, but instead of ingesting the resulting file just print it to stdout and stop.

    I'm condensing the output by removing multiple empty lines: the preprocessor strips comment lines and lines with directives it took and processed, but still prints the resulting empty line to maintain correct line numbers for diagnostics, possibly output by next processing phases. The extra lines starting with # are for next passes and the same purpose, too: they simply establish line number and file name being processed. Ignore them for good measure.

    $ touch "pybind11.h" "other.h" "stuff.h"
    
    $ ls *.{cc,h,absdef}
    other.h  pybind.cc  pybind11.h  structs.absdef  structs.h  stuff.h
    
    $ c++ -E pybind.cc
    # 1 "pybind.cc"
    # 1 "<built-in>"
    # 1 "<command-line>"
    # 1 "/usr/include/stdc-predef.h" 1 3 4
    # 1 "<command-line>" 2
    # 1 "pybind.cc"
    
    # 1 "pybind11.h" 1
    # 4 "pybind.cc" 2
    # 1 "other.h" 1
    # 5 "pybind.cc" 2
    # 1 "stuff.h" 1
    # 6 "pybind.cc" 2
    
    # 1 "structs.h" 1
    # 10 "structs.h"
    # 1 "structs.absdef" 1
    
    struct Config {
      bool option1;
      bool option2;
      int arg1;
    };
    # 11 "structs.h" 2
    # 8 "pybind.cc" 2
    
    void create_pybind_bindings() {
    # 1 "structs.absdef" 1
    
    py::class_<Config>(m, "Config")
      .def_readwrite("option1", &Config::option1)
      .def_readwrite("option2", &Config::option2)
      .def_readwrite("arg1", &Config::arg1)
    ;
    # 15 "pybind.cc" 2
    }
    

    or, without the hints of the form # number file flags that are only required for the compiler to print proper diagnostic context (like "structs.absdef:5 included from structs.h:10: error: ..."), nice and clean exact copy of your desired code of the compilation unit that the actual compiler processes is:

    struct Config {
      bool option1;
      bool option2;
      int arg1;
    };
    
    void create_pybind_bindings() {
    py::class_<Config>(m, "Config")
      .def_readwrite("option1", &Config::option1)
      .def_readwrite("option2", &Config::option2)
      .def_readwrite("arg1", &Config::arg1)
    ;
    }
    

    4. Colophon, or A bit of smartassery and a bit of history

    • Not every new technology is better for everything simply because it's new.
    • The preprocessor is in fact slightly older than the C language itself. 49 years old, to be exact. C adopted the preprocessor used inside Bell Labs for other languages.