Search code examples
c++dsl

How to parse text for a DSL at compile time?


Yes. That's right. I want to be able to paste an expression like:

"a && b || c"

directly into source code as a string:

const std::string expression_text("a && b || c");

Create a lazily evaluated structure with it:

Expr expr(magical_function(expression_text));

then later on evaluate substituting in known values:

evaluate(expr, a, b, c);

I'd want to expand this little DSL later so does something a little more complicated using some non-C++ syntax so I can't simply hardcode my expression the simple way. The use case is that I'll be able to copy and paste the same logic from another module used in a different development area for another language rather than have to adapt it each time to follow C++ syntax.

If someone can get me started on at least how to do the above simple concept of 1 expression and 2 boolean operators that would be really appreciated.

Note: I posted this question due to feedback from another question I posted: How to parse DSL input to high performance expression template. Here I actually wanted an answer to a slightly different problem, but the comments provoked this specific question that I thought was worth posting as the potential answers are really worth documenting.


Solution

  • Disclaimer: I know nothing about metaparse, and very little about proto. The following code is my attempt (mostly via trial and error) to modify this example to do something similar to what you want.

    The code can be easily divided in several parts:

    1. The grammar


    1.1 Token definitions

    typedef token < lit_c < 'a' > > arg1_token;
    typedef token < lit_c < 'b' > > arg2_token;
    typedef token < lit_c < 'c' > > arg3_token;
    
    • token<Parser>:
      token is a parser combinator that uses Parser to parse the input and then consumes (and discards) all whitespaces afterwards. The result of the parsing is the result of Parser.
    • lit_c<char>:
      lit_c matches the specific char and the result of the parsing is that same char. In the grammar this result is overridden by the use of always.
    typedef token < keyword < _S ( "true" ), bool_<true> > > true_token;
    typedef token < keyword < _S ( "false" ), bool_<false> > > false_token;
    
    • keyword<metaparse_string,result_type=undefined>:
      keyword matches the specific metaparse_string (_S("true") returns metaparse::string<'t','r','u','e'> which is what metaparse uses internally to do its magic) and the result of the parsing is result_type.
    typedef token < keyword < _S ( "&&" ) > > and_token;
    typedef token < keyword < _S ( "||" ) > > or_token;
    typedef token < lit_c < '!' > > not_token;
    

    In the case of and_token and or_token the result is undefined and in the grammar below it is ignored.


    1.2 "Rules" of the grammar

    struct paren_exp;
    

    First paren_exp is forward-declared.

    typedef one_of< 
            paren_exp, 
            transform<true_token, build_value>,
            transform<false_token, build_value>, 
            always<arg1_token, arg<0> >,
            always<arg2_token, arg<1> >, 
            always<arg3_token, arg<2> > 
        >
        value_exp;
    
    • one_of<Parsers...>:
      one_of is a parser combinator that tries to match the input to one of its parameters. The result is what the first parser that matches returns.
    • transform<Parser,SemanticAction>:
      transform is a parser combinator that matches Parser. The result type is the result type of Parser transformed by SemanticAction.
    • always<Parser,NewResultType>:
      matches Parser, returns NewResultType.

      The equivalent spirit rule would be:

      value_exp = paren_exp [ _val=_1 ]
          | true_token      [ _val=build_value(_1) ]
          | false_token     [ _val=build_value(_1) ]
          | argN_token      [ _val=phx::construct<arg<N>>() ];
      
    typedef one_of< 
            transform<last_of<not_token, value_exp>, build_not>, 
            value_exp
        >
        not_exp;
    
    • last_of<Parsers...>:
      last_of matches every one of the Parsers in sequence and its result type is the result type of the last parser.

      The equivalent spirit rule would be:

      not_exp = (omit[not_token] >> value_exp) [ _val=build_not(_1) ] 
          | value_exp                          [ _val=_1 ];
      
    typedef
    foldl_start_with_parser<
            last_of<and_token, not_exp>,
            not_exp,
            build_and
        > and_exp; // and_exp = not_exp >> *(omit[and_token] >> not_exp);
    
    typedef
    foldl_start_with_parser<
        last_of<or_token, and_exp>,
        and_exp,
        build_or
    > or_exp;     // or_exp = and_exp >> *(omit[or_token] >> and_exp);
    
    • foldl_start_with_parser<RepeatingParser,InitialParser,SemanticAction>:
      this parser combinator matches InitialParser and then RepeatingParser multiple times until it fails. The result type is the result of mpl::fold<RepeatingParserSequence, InitialParserResult, SemanticAction>, where RepeatingParserSequence is a sequence of the result types of every application of RepeatingParser. If RepeatingParser never succeeds the result type is simply InitialParserResult.

      I believe (xd) that the equivalent spirit rule would be:

      or_exp = and_exp[_a=_1] 
          >> *( omit[or_token] >> and_exp [ _val = build_or(_1,_a), _a = _val ]);  
      
    struct paren_exp: middle_of < lit_c < '(' > , or_exp, lit_c < ')' > > {}; 
       // paren_exp = '(' >> or_exp >> ')';
    
    • middle_of<Parsers...>:
      this matches the sequence of Parsers and the result type is the result of the parser that is in the middle.
    typedef last_of<repeated<space>, or_exp> expression; 
       //expression = omit[*space] >> or_exp;
    
    • repeated<Parser>:
      this parser combinator tries to match Parser multiple times. The result is a sequence of the result types of every application of the parser, if the parser fails on its first try the result is an empty sequence. This rule simply removes any leading whitespace.
    typedef build_parser<entire_input<expression> > function_parser;
    

    This line creates a metafunction that accepts an input string and returns the result of parsing.


    2. Construction of the expression

    Let's look at an example walkthrough of the building of an expression. This is done in two steps: first the grammar constructs a tree that depends on build_or, build_and, build_value, build_not and arg<N>. Once you get that type, you can get the proto expression using the proto_type typedef.

    "a || !b"

    We start on or_expr:

    • or_expr: We try its InitialParser which is and_expr.
      • and_expr: We try its InitialParser which is not_expr.
        • not_expr: not_token fails so we try value_expr.
          • value_expr: arg1_token succeeds. The return type is arg<0> and we go back to not_expr.
        • not_expr: the return type is not modified at this step. We go back to and_expr.
      • and_expr: We try its RepeatingParser, it fails. and_expr succeeds and its return type is the return type of its InitialParser: arg<0>. We go back to or_expr.
      • or_expr: We try its RepeatingParser, or_token matches, we try and_expr.
      • and_expr: We try its InitialParser not_expr.
        • not_expr: not_token succeeds, we try value_expr.
          • value_expr: arg2_token succeeds. The return type is arg<1> and we go back to not_expr.
        • not_expr: the return type is modified by transform using build_not: build_not::apply< arg<1> >. We go back to and_expr.
      • and_expr: We try its RepeatingParser, it fails. and_expr succeeds and returns build_not::apply< arg<1> >. We go back to or_expr.
    • or_expr: RepeatingParser has succeeded, foldlp uses build_or on build_not::apply< arg<1> > and arg<0>, obtaining build_or::apply< build_not::apply< arg<1> >, arg<0> >.

    Once we have this tree constructed we get its proto_type:

    build_or::apply< build_not::apply< arg<1> >, arg<0> >::proto_type;
    proto::logical_or< arg<0>::proto_type, build_not::apply< arg<1> >::proto_type >::type;
    proto::logical_or< proto::terminal< placeholder<0> >::type, build_not::apply< arg<1> >::proto_type >::type;
    proto::logical_or< proto::terminal< placeholder<0> >::type, proto::logical_not< arg<1>::proto_type >::type >::type;
    proto::logical_or< proto::terminal< placeholder<0> >::type, proto::logical_not< proto::terminal< placeholder<1> >::type >::type >::type;
    

    Full Sample Code (Running on Wandbox)

    #include <iostream>
    #include <vector>
    
    #include <boost/metaparse/repeated.hpp>
    #include <boost/metaparse/sequence.hpp>
    #include <boost/metaparse/lit_c.hpp>
    #include <boost/metaparse/last_of.hpp>
    #include <boost/metaparse/middle_of.hpp>
    #include <boost/metaparse/space.hpp>
    #include <boost/metaparse/foldl_start_with_parser.hpp>
    #include <boost/metaparse/one_of.hpp>
    #include <boost/metaparse/token.hpp>
    #include <boost/metaparse/entire_input.hpp>
    #include <boost/metaparse/string.hpp>
    #include <boost/metaparse/transform.hpp>
    #include <boost/metaparse/always.hpp>
    #include <boost/metaparse/build_parser.hpp>
    #include <boost/metaparse/keyword.hpp>
    
    #include <boost/mpl/apply_wrap.hpp>
    #include <boost/mpl/front.hpp>
    #include <boost/mpl/back.hpp>
    #include <boost/mpl/bool.hpp>
    
    #include <boost/proto/proto.hpp>
    #include <boost/fusion/include/at.hpp>
    #include <boost/fusion/include/make_vector.hpp>
    
    using boost::metaparse::sequence;
    using boost::metaparse::lit_c;
    using boost::metaparse::last_of;
    using boost::metaparse::middle_of;
    using boost::metaparse::space;
    using boost::metaparse::repeated;
    using boost::metaparse::build_parser;
    using boost::metaparse::foldl_start_with_parser;
    using boost::metaparse::one_of;
    using boost::metaparse::token;
    using boost::metaparse::entire_input;
    using boost::metaparse::transform;
    using boost::metaparse::always;
    using boost::metaparse::keyword;
    
    using boost::mpl::apply_wrap1;
    using boost::mpl::front;
    using boost::mpl::back;
    using boost::mpl::bool_;
    
    
    struct build_or
    {
        typedef build_or type;
    
        template <class C, class State>
        struct apply
        {
            typedef apply type;
            typedef typename boost::proto::logical_or<typename State::proto_type, typename C::proto_type >::type proto_type;
        };
    };
    
    struct build_and
    {
        typedef build_and type;
    
        template <class C, class State>
        struct apply
        {
            typedef apply type;
            typedef typename boost::proto::logical_and<typename State::proto_type, typename C::proto_type >::type proto_type;
        };
    };
    
    
    
    template<bool I>
    struct value //helper struct that will be used during the evaluation in the proto context
    {};
    
    struct build_value
    {
        typedef build_value type;
    
        template <class V>
        struct apply
        {
            typedef apply type;
            typedef typename boost::proto::terminal<value<V::type::value> >::type proto_type;
        };
    };
    
    struct build_not
    {
        typedef build_not type;
    
        template <class V>
        struct apply
        {
            typedef apply type;
            typedef typename boost::proto::logical_not<typename V::proto_type >::type proto_type;
        };
    };
    
    template<int I>
    struct placeholder //helper struct that will be used during the evaluation in the proto context
    {};
    
    template<int I>
    struct arg
    {
        typedef arg type;
        typedef typename boost::proto::terminal<placeholder<I> >::type proto_type;
    };
    
    #ifdef _S
    #error _S already defined
    #endif
    #define _S BOOST_METAPARSE_STRING
    
    typedef token < keyword < _S ( "&&" ) > > and_token;
    typedef token < keyword < _S ( "||" ) > > or_token;
    typedef token < lit_c < '!' > > not_token;
    
    typedef token < keyword < _S ( "true" ), bool_<true> > > true_token;
    typedef token < keyword < _S ( "false" ), bool_<false> > > false_token;
    
    typedef token < lit_c < 'a' > > arg1_token;
    typedef token < lit_c < 'b' > > arg2_token;
    typedef token < lit_c < 'c' > > arg3_token;
    
    
    struct paren_exp;
    
    typedef
    one_of< paren_exp, transform<true_token, build_value>, transform<false_token, build_value>, always<arg1_token, arg<0> >, always<arg2_token, arg<1> >, always<arg3_token, arg<2> > >
    value_exp; //value_exp = paren_exp | true_token | false_token | arg1_token | arg2_token | arg3_token;
    
    typedef
    one_of< transform<last_of<not_token, value_exp>, build_not>, value_exp>
    not_exp; //not_exp = (omit[not_token] >> value_exp) | value_exp;
    
    typedef
    foldl_start_with_parser <
    last_of<and_token, not_exp>,
             not_exp,
             build_and
             >
             and_exp; // and_exp = not_exp >> *(and_token >> not_exp);
    
    typedef
    foldl_start_with_parser <
    last_of<or_token, and_exp>,
             and_exp,
             build_or
             >
             or_exp; // or_exp = and_exp >> *(or_token >> and_exp);
    
    struct paren_exp: middle_of < lit_c < '(' > , or_exp, lit_c < ')' > > {}; //paren_exp = lit('(') >> or_exp >> lit('(');
    
    typedef last_of<repeated<space>, or_exp> expression; //expression = omit[*space] >> or_exp;
    
    typedef build_parser<entire_input<expression> > function_parser;
    
    
    template <typename Args>
    struct calculator_context
            : boost::proto::callable_context< calculator_context<Args> const >
    {
        calculator_context ( const Args& args ) : args_ ( args ) {}
        // Values to replace the placeholders
        const Args& args_;
    
        // Define the result type of the calculator.
        // (This makes the calculator_context "callable".)
        typedef bool result_type;
    
        // Handle the placeholders:
        template<int I>
        bool operator() ( boost::proto::tag::terminal, placeholder<I> ) const
        {
            return boost::fusion::at_c<I> ( args_ );
        }
    
        template<bool I>
        bool operator() ( boost::proto::tag::terminal, value<I> ) const
        {
            return I;
        }
    };
    
    template <typename Args>
    calculator_context<Args> make_context ( const Args& args )
    {
        return calculator_context<Args> ( args );
    }
    
    template <typename Expr, typename ... Args>
    int evaluate ( const Expr& expr, const Args& ... args )
    {
        return boost::proto::eval ( expr, make_context ( boost::fusion::make_vector ( args... ) ) );
    }
    
    #ifdef LAMBDA
    #error LAMBDA already defined
    #endif
    #define LAMBDA(exp) apply_wrap1<function_parser, _S(exp)>::type::proto_type{}
    
    int main()
    {
        using std::cout;
        using std::endl;
    
        cout << evaluate ( LAMBDA ( "true&&false" ) ) << endl;
        cout << evaluate ( LAMBDA ( "true&&a" ), false ) << endl;
        cout << evaluate ( LAMBDA ( "true&&a" ), true ) << endl;
        cout << evaluate ( LAMBDA ( "a&&b" ), true, false ) << endl;
        cout << evaluate ( LAMBDA ( "a&&(b||c)" ), true, false, true ) << endl;
        cout << evaluate ( LAMBDA ( "!a&&(false||(b&&!c||false))" ), false, true, false ) << endl;
    }
    
    /*int main(int argc , char** argv)
    {
        using std::cout;
        using std::endl;
    
        bool a=false, b=false, c=false;
    
        if(argc==4)
        {
            a=(argv[1][0]=='1');
            b=(argv[2][0]=='1');
            c=(argv[3][0]=='1');
        }
    
        LAMBDA("a && b || c") expr;
    
        cout << evaluate(expr, true, true, false) << endl;
        cout << evaluate(expr, a, b, c) << endl;
    
        return 0;
    }*/