Search code examples
c++pybind11yaml-cpp

using pybind11 to wrap yaml-cpp iterator


I am trying to wrap some of the yaml-cpp code with pybind11. I realize there is a python module for manipulating yaml files, but I would appreciate help with this approach. I am just trying to get familiar with pybind11.

Specifically, I would like to wrap the iterator for a YAML::Node, but the return type from the iterator is not a YAML::Node, it is a YAML::detail::iterator_value. How do I go from this type back to a YAML::Nodein the iterator lambda function? Here are the relevant parts of my code.

utilities_py.cc

#include "yaml-cpp/yaml.h"
#include "pybind11/pybind11.h"

PYBIND11_MODULE(utilities, m) {
  namespace py = pybind11;

    py::class_<YAML::detail::iterator_value>(m, "YamlDetailIteratorValue")
        .def(py::init<>());

    py::class_<YAML::Node>(m, "YamlNode")
        .def(py::init<const std::string &>())
        .def("__getitem__",
            [](const YAML::Node node, const std::string key){
              return node[key];
            })
        .def("__iter__",
            [](const YAML::Node &node) {
              return py::make_iterator(node.begin(), node.end());},
             py::keep_alive<0, 1>());

    m.def("load_file", &YAML::LoadFile, "");
}

test_utilities_py.py

from utilities import load_file

test_node = load_file('test.yaml')
for nodelette in test_node:
    prop = nodelette['prop']

And I get the following error:

TypeError: __getitem__: incompatible function arguments. The following argument types are supported:
    1. (arg0: utilities.YamlNode, arg1: str) -> utilities.YamlNode

Invoked with: <utilities.YamlDetailIteratorValue object at 0x7f8babc446f0>, 'prop'

Solution

  • You are close. If you look at the source, YAML::detail::iterator_value extends YAML::Node, so you have to account for that in the python code. It also extends std::pair<YAML::Node, YAML::Node>, so that will need to be accounted for in some way as well.

    struct iterator_value : public Node, std::pair<Node, Node> {
    

    When that gets bound, we'll have to make sure that Node is bound as the parent class. That will look like:

    py::class_<YAML::detail::iterator_value, YAML::Node>(m, "YamlDetailIteratorValue")
    

    Now you have all the Node methods when you iterate, which is good! But you're going to run into real trouble because of the fact that iterator_value also inherits from std::pair. As far as I know, there is no way to just use that as a parent type in pybind11, even though it has auto-conversions for pairs (there is bind_vector and bind_map but no bind_pair). I think you could write your own binding for such a thing, but I'm not sure it is necessary. Really what you need to do is inspect the type of the Node you are about to iterate over and then iterate a little differently depending on if it is map or a sequence (this is similar to how the c++ api works where there is a single iterator type for both sequences and maps, but certain functions will fail if called in the wrong context).

    Here is how I ended up solving the problem:

    PYBIND11_MODULE(utilities, m) {
        py::enum_<YAML::NodeType::value>(m, "NodeType")
        .value("Undefined", YAML::NodeType::Undefined)
        .value("Null", YAML::NodeType::Null)
        .value("Scalar", YAML::NodeType::Scalar)
        .value("Sequence", YAML::NodeType::Sequence)
        .value("Map", YAML::NodeType::Map);
    
        py::class_<YAML::Node>(m, "YamlNode")
            .def(py::init<const std::string &>())
            .def("__getitem__",
                [](const YAML::Node node, const std::string& key){
                  return node[key];
                })
            .def("__iter__",
                [](const YAML::Node &node) {
                  return py::make_iterator(node.begin(), node.end());},
                 py::keep_alive<0, 1>())
            .def("__str__",
                 [](const YAML::Node& node) {
                   YAML::Emitter out;
                   out << node;
                   return std::string(out.c_str());
                 })
            .def("type", &YAML::Node::Type)
            .def("__len__", &YAML::Node::size)
            ;
    
        py::class_<YAML::detail::iterator_value, YAML::Node>(m, "YamlDetailIteratorValue")
            .def(py::init<>())
            .def("first", [](YAML::detail::iterator_value& val) { return val.first;})
            .def("second", [](YAML::detail::iterator_value& val) { return val.second;})
            ;
    
        m.def("load_file", &YAML::LoadFile, "");
    

    I bound in the NodeType enum, so you can have that exposed when you call type on a node. Then I bound in first and second for the iterator_value type, so you can access the map values in a loop. you can switch on type() to figure out how to iterate. My example yaml file

    ---
     doe: "a deer, a female deer"
     ray: "a drop of golden sun"
     pi: 3.14159
     xmas: true
     french-hens: 3
     calling-birds:
       - huey
       - dewey
       - louie
       - fred
     xmas-fifth-day:
       calling-birds: four
       french-hens: 3
       golden-rings: 5
       partridges:
         count: 1
         location: "a pear tree"
       turtle-doves: two
    

    And my example python (3.8) code using the bound c++

    import example
    from example import load_file
    
    def iterator(node):
        if node.type() == example.NodeType.Sequence:
            return node
        elif node.type() == example.NodeType.Map:
            return ((e.first(), e.second()) for e in node)
        return (node,)
    
    test_node = load_file('test.yml')
    
    
    for key, value in iterator(test_node):
        if value.type() == example.NodeType.Sequence:
            print("list")
            for v in iterator(value):
                print(v)
        elif value.type() == example.NodeType.Map:
            print("map")
            for k,v in iterator(value):
                temp = value[str(k)]
                print(k, v)
                print(str(v) == str(temp))
    

    Demonstrates correct iteration for the different types as well as the fact that __get__ works on maps just as well as it does when calling .second on the iterator_value. You probably want to override __get__ on ints, so it will let you do sequence access as well.

    You've got a bonus __str__ method as well, to make all the print calls work.