Search code examples
c++parsingboostboost-spirit-x3

Boost Spirit x3 -- Parameterizing Parsers with other Parsers


I don't have a whole lot of code to show for this one because I haven't managed to get anything to work, but the high level problem is that I am trying to create a series of parsers for a family of related languages. What I mean by this is that the languages will share many of the same constructs, but there won't be complete overlap. As a simple example, say I have an AST that is parameterized by some (completely contrived in this example) 'leaf' type:

template <typename t>
struct fooT {
  std::string name;
  t leaf;
};

One language may have t instantiated as int and one as double. What I wanted to do was create a templated class or something that I could instantiate with different t's and corresponding parser rules so that I could generate a series of composed parsers.

In my real example, I have a bunch of nested structures that are the same across the languages, but only have a couple of small variations at the very edges of the AST, so if I cannot compose the parsers in a good way, I will end up duplicating a bunch of parse rules, AST nodes, etc. I have actually gotten it to work by not putting it in a class and just very carefully arranging my header files and imports so that I can have 'dangling' parser rules with special names that can be assembled. A big downside of this is that I cannot include parsers for the multiple different languages within the same program -- precisely because of the name conflict that arises.

Does anybody have any ideas how I could approach this?


Solution

  • The nice thing about X3 is that you can generate parsers just as easily as you define them in the first place.

    E.g.

    template <typename T> struct AstNode {
        std::string name;
        T leaf;
    };
    

    Now let's define a generic parser maker:

    namespace Generic {
        template <typename T> auto leaf = x3::eps(false);
    
        template <> auto leaf<int>
            = "0x" >> x3::int_parser<uintmax_t, 16>{};
        template <> auto leaf<std::string>
            = x3::lexeme['"' >> *~x3::char_('"') >> '"'];
    
        auto no_comment = x3::space;
        auto hash_comments = x3::space |
            x3::lexeme['#' >> *(x3::char_ - x3::eol)] >> (x3::eol | x3::eoi);
        auto c_style_comments = x3::space |
            "/*" >> x3::lexeme[*(x3::char_ - "*/")] >> "*/";
        auto cxx_style_comments = c_style_comments |
            x3::lexeme["//" >> *(x3::char_ - x3::eol)] >> (x3::eol | x3::eoi);
    
        auto name = leaf<std::string>;
    
        template <typename T> auto parseNode(auto heading, auto skipper) {
            return x3::skip(skipper)[
                x3::as_parser(heading) >> name >> ":" >> leaf<T>
            ];
        }
    }
    

    This allows us to compose various grammars with various leaf types and skipper styles:

    namespace Language1 {
        static auto const grammar =
            Generic::parseNode<int>("value", Generic::no_comment);
    }
    
    namespace Language2 {
        static auto const grammar =
            Generic::parseNode<std::string>("line", Generic::cxx_style_comments);
    }
    

    Let's Demo:

    Live On Coliru

    #include <boost/spirit/home/x3.hpp>
    #include <boost/fusion/adapted.hpp>
    #include <iomanip>
    namespace x3 = boost::spirit::x3;
    
    template <typename T> struct AstNode {
        std::string name;
        T leaf;
    };
    
    BOOST_FUSION_ADAPT_TPL_STRUCT((T), (AstNode)(T), name, leaf)
    
    namespace Generic {
        template <typename T> auto leaf = x3::eps(false);
    
        template <> auto leaf<int>
            = "0x" >> x3::uint_parser<uintmax_t, 16>{};
        template <> auto leaf<std::string>
            = x3::lexeme['"' >> *~x3::char_('"') >> '"'];
    
        auto no_comment = x3::space;
        auto hash_comments = x3::space |
            x3::lexeme['#' >> *(x3::char_ - x3::eol)] >> (x3::eol | x3::eoi);
        auto c_style_comments = x3::space |
            "/*" >> x3::lexeme[*(x3::char_ - "*/")] >> "*/";
        auto cxx_style_comments = c_style_comments |
            x3::lexeme["//" >> *(x3::char_ - x3::eol)] >> (x3::eol | x3::eoi);
    
        auto name = leaf<std::string>;
    
        template <typename T> auto parseNode(auto heading, auto skipper) {
            return x3::skip(skipper)[
                x3::as_parser(heading) >> name >> ":" >> leaf<T>
            ];
        }
    }
    
    namespace Language1 {
        static auto const grammar =
            Generic::parseNode<int>("value", Generic::no_comment);
    }
    
    namespace Language2 {
        static auto const grammar =
            Generic::parseNode<std::string>("line", Generic::cxx_style_comments);
    }
    
    void test(auto const& grammar, std::string_view text, auto ast) {
        auto f = text.begin(), l = text.end();
        std::cout << "\nParsing: " << std::quoted(text, '\'') << "\n";
        if (parse(f, l, grammar, ast)) {
            std::cout << " -> {name:" << ast.name << ",value:" << ast.leaf << "}\n";
        } else {
            std::cout << " -- Failed " << std::quoted(text, '\'') << "\n";
        }
    }
    
    int main() {
        test(Language1::grammar, R"(value "one": 0x01)", AstNode<int>{});
        test(
            Language2::grammar,
            R"(line "Hamlet": "There is nothing either good or bad, but thinking makes it so.")",
            AstNode<std::string>{});
    
        test(
            Language2::grammar,
            R"(line // rejected: "Hamlet": "To be ..."
            "King Lear": /*hopefully less trite:*/"As flies to wanton boys are we to the gods")",
            AstNode<std::string>{});
    }
    

    Prints

    Parsing: 'value "one": 0x01'
     -> {name:one,value:1}
    
    Parsing: 'line "Hamlet": "There is nothing either good or bad, but thinking makes it so."'
     -> {name:Hamlet,value:There is nothing either good or bad, but thinking makes it so.}
    
    Parsing: 'line // rejected: "Hamlet": "To be ..."
            "King Lear": /*hopefully less trite:*/"As flies to wanton boys are we to the gods"'
     -> {name:King Lear,value:As flies to wanton boys are we to the gods}
    

    Advanced

    For advanced scenarios (where you have separation of rule declaration and definitions across trnalsation units and/or you require dynamic switching), you can use the x3::any_rule<> holder.