Search code examples
c++boostboost-spirit-x3

How to write binary operator with two post operands syntax with Boost Spirit x3?


I am following this example: https://github.com/boostorg/spirit/blob/develop/example/x3/calc/calc9/expression_def.hpp

What I am trying to accomplish is to write a rule that parses and generates like min{x}{y}. Mostly the code is using expression grammar like x + y, but now I want to place and parse both operands to the rhs of the operator.

I added the following code in expression_def.hpp file:

    ...
    x3::symbols<ast::optoken> additive_op;
    x3::symbols<ast::optoken> multiplicative_op;
    x3::symbols<ast::optoken> binarypost_op;
    x3::symbols<ast::optoken> unary_op;
    x3::symbols<> keywords;
    ...

    binarypost_op.add
        ("min", ast::op_divide) // Dummy operation usage for now
        ;
    ...
    struct binarypost_expr_class;
    struct unary_expr_class; 
    ...
    typedef x3::rule<binarypost_expr_class, ast::expression> 
    binarypost_expr_type;
    ...
    binarypost_expr_type const binarypost_expr = "binarypost_expr";
    ... 

    auto const multiplicative_expr_def =
    binarypost_expr
    >> *(multiplicative_op > binarypost_expr)
    ;
    auto const binarypost_expr_def =           // See the chaining operation
    ('{' > unary_expr > '}')
    >> *(binarypost_op > ('{' > unary_expr > '}'))
    ;
    auto const unary_expr_def =
        primary_expr
    |   (unary_op > primary_expr)
    ;

This works fine. But it can only parse something like , {x} min {y}. I want to be able to parse min {x} {y}. I tried the many combinations such as :

binarypost_op >> ('{' > unary_expr > '}') > ('{' > unary_expr > '}') etc. But I cant seem to figure it out as to what is the right way to write this? Any suggestions / comments ?


Solution

  • Ok, here's the changes. The hard part is actually code-generating the builtin function.

    Parsing


    Step 1: extend AST

    Always start with the AST. We want operands that can be function calls:

    In ast.hpp:

    struct function_call;  // ADDED LINE
    
    // ...
    
    struct operand :
        x3::variant<
            nil
          , unsigned int
          , variable
          , x3::forward_ast<unary>
          , x3::forward_ast<expression>
          , x3::forward_ast<function_call> // ADDED LINE
        >
    {
        using base_type::base_type;
        using base_type::operator=;
    };
    
    // ...
    
    enum funtoken
    {
        fun_min,
        fun_max,
    };
    
    // ...
    
    struct function_call : x3::position_tagged
    {
        funtoken fun;
        std::list<operand> args;
    };
    

    In ast_adapted.hpp:

    BOOST_FUSION_ADAPT_STRUCT(client::ast::function_call,
        fun, args
    )
    

    Step 2: extend grammar

    (This is all in expression_def.hpp)

    Let's be generic, so parse function name tokens using a symbol table:

    x3::symbols<ast::funtoken> functions;
    

    Which we have to initialize in add_keywords:

    functions.add
        ("min", ast::fun_min)
        ("max", ast::fun_max)
        ;
    

    Now declare a rule for function calls:

    struct function_call_class;
    typedef x3::rule<function_call_class, ast::function_call>    function_call_type;
    function_call_type const function_call = "function_call";
    

    That's all red-tape. The "interesting thing" is the rule definition:

    auto const function_call_def =
            functions
        >>  '(' >> expression % ',' >> ')'
        ;
    

    Well. That's underwhelming. Let's integrate into our primary expression rule:

    auto const primary_expr_def =
            uint_
        |   bool_
        |   function_call
        |   (!keywords >> identifier)
        |   ('(' > expression > ')')
        ;
    

    Note the ordering. If you want to be able to add function names that collide with a keyword, you'll need to add precautions.

    Also, lets make AST annotation work for our node:

    struct function_call_class : x3::annotate_on_success {};
    

    Code generation

    It's easy to find where to add support for the new AST node:

    In compiler.hpp:

     bool operator()(ast::function_call const& x) const;
    

    Now comes the hard part.

    What's really required for general n-ary is an accumulator. Since we don't have registers, this would need to be a temporary (local). However, since the VM implementation doesn't have these, I've limited the implementation to a fixed binary function call only.

    Note that the VM already has support for function calls. Functions can have locals. So, if you code-gen a variable-argument built-in function you can implement a left-fold recursive solution.

    In compiler.cpp:

    bool compiler::operator()(ast::function_call const& x) const
    {
        auto choice = [&](int opcode) {
            BOOST_ASSERT(x.args.size() == 2); // TODO FIXME hardcoded binary builtin
            auto it = x.args.begin();
    
            auto& a = *it++;
            if (!boost::apply_visitor(*this, a))
                return false;
    
            auto& b = *it++;
            if (!boost::apply_visitor(*this, b))
                return false;
            program.op(opcode); // the binary fold operation
    
            program.op(op_jump_if, 0);
            size_t const branch = program.size()-1;
    
            if (!boost::apply_visitor(*this, a))
                return false;
            program.op(op_jump, 0);
            std::size_t continue_ = program.size()-1;
    
            program[branch] = int(program.size()-branch);
            if (!boost::apply_visitor(*this, b))
                return false;
    
            program[continue_] = int(program.size()-continue_);
            return true;
        };
    
        switch (x.fun) {
            case ast::fun_min: return choice(op_lt);
            case ast::fun_max: return choice(op_gt);
            default: BOOST_ASSERT(0); return false;
        }
        return true;
    }
    

    I've just taken inspiration from the surrounding code on how to generate the jump labels.


    Trying It Out

    • A simplistic example would be: var x = min(1,3);

      Assembler----------------
      
      local       x, @0
      start:
            op_stk_adj  1
            op_int      1
            op_int      3
            op_lt
            op_jump_if  13
            op_int      1
            op_jump     15
      13:
            op_int      3
      15:
            op_store    x
      end:
      -------------------------
      Results------------------
      
          x: 1
      -------------------------
      
    • Running it with some random contrived input:

      ./test <<< "var a=$(($RANDOM % 100)); var 
      

      b=$(($RANDOM % 100)); var contrived=min(max(27,2*a), 100+b);"

      Prints e.g.:

      Assembler----------------
      
      local       a, @0
      local       b, @1
      local       contrived, @2
      start:
            op_stk_adj  3
            op_int      31
            op_store    a
            op_int      71
            op_store    b
            op_int      27
            op_int      2
            op_load     a
            op_mul
            op_gt
            op_jump_if  24
            op_int      27
            op_jump     29
      24:
            op_int      2
            op_load     a
            op_mul
      29:
            op_int      100
            op_load     b
            op_add
            op_lt
            op_jump_if  58
            op_int      27
            op_int      2
            op_load     a
            op_mul
            op_gt
            op_jump_if  51
            op_int      27
            op_jump     56
      51:
            op_int      2
            op_load     a
            op_mul
      56:
            op_jump     63
      58:
            op_int      100
            op_load     b
            op_add
      63:
            op_store    contrived
      end:
      -------------------------
      Results------------------
      
          a: 31
          b: 71
          contrived: 62
      -------------------------