Search code examples
boost-spiritboost-spirit-qiboost-spirit-lex

Eliminating syntactic sugar using Spirit.Qi


I'm trying to parse a lisp-like language which has some syntactic sugar for common functions. For example, the plus function can be written as (+ 1 2) or as 1 + 2. I'm thinking that eliminating syntactic sugar before trying to interpret the language would significantly facilitate the interpretation process, because then, the only evaluation rule that needs to be implemented is going to be that of a function call, and all infix operators would have to have corresponding built-in functions. I was thinking I could create a parser that will iterate over the tokens received from the lexer, and reorder them to put the expression into prefix notation. But this would require the output of the parser to also be a list of tokens. Is that possible in Spirit.Qi? As far as I understand, Spirit.Qi builds a hierarchical structure not a linear list of tokens.


Solution

  • Everything is, of course, possible.

    That said, why don't you operate on the AST, and

    • don't worry about the input representation while interpreting
    • don't worry about interpreting while lexing/parsing

    Consider an AST representation like this (inspired on a kind of simplified s-expression):

    typedef boost::make_recursive_variant<
            std::string,
            std::vector< boost::recursive_variant_ >
       >::type expr_t;
    

    You'd be able to define rules to accept either s_expr or binobj and parse into the same AST

    qi::rule<It, std::vector<expr_t>(), Skipper> binop;
    qi::rule<It, expr_t(), Skipper> expr, s_expr, name;
    
    expr   = binop | s_expr;
    
    binop = (s_expr >> (string("+") | string("-") | string("*") | string("/")) >> expr)
        [
            phx::push_back(_val, _2),
            phx::push_back(_val, _1),
            phx::push_back(_val, _3)
        ];
    
    name = as_string [ lexeme [ +(graph - char_("()") ) ] ];
    
    s_expr = ('(' > +expr > ')') | name;
    

    Full Demo

    I have a full demo of this idea here: https://gist.github.com/3047788


    For fun, I threw in

    A demo main of test.cpp:

    int main()
    {
        for (const auto input : std::list<std::string> {
             "",
             "(+ 1 3)",
             "(1 + 3)",
             "(1 + 4 / 2)",
             "(1 + (* 4 5 6) / 2)",
             "(avg   1 + (* 4 5 6) / 2)",
             "(trace 1 + (* 4 5 6) / 2 1 1 1 1 999)",
             "(avg   1 + (* 4 5 6) / 2 1 1 1 1 999)",
             "(avg   1 + (* 4 (unknown-function 5) 6) / 2 a b c)",
         })
        {
            std::cout << "----------------------\n";
            expr_t ast = doParse(input, qi::space);
    
            try 
            {
                std::cout << "eval<double>: " << eval<double>(ast) << std::endl;
                std::cout << "eval<int>   : " << eval<int>   (ast) << std::endl;
            } catch(std::exception const& e)
            {
                std::cerr << e.what() << '\n';
            }
        }
    }
    

    Results in the following output:

    ----------------------
    parse failed: ''
    warning: undefined '<no ast>'
    eval<double>: 0
    warning: undefined '<no ast>'
    eval<int>   : 0
    ----------------------
    parse success
    input       : (+ 1 3)
    ast parsed  : (+ 1 3)
    eval<double>: 4
    eval<int>   : 4
    ----------------------
    parse success
    input       : (1 + 3)
    ast parsed  : ((+ 1 3))
    eval<double>: 4
    eval<int>   : 4
    ----------------------
    parse success
    input       : (1 + 4 / 2)
    ast parsed  : ((+ 1 (/ 4 2)))
    eval<double>: 3
    eval<int>   : 3
    ----------------------
    parse success
    input       : (1 + (* 4 5 6) / 2)
    ast parsed  : ((+ 1 (/ (* 4 5 6) 2)))
    eval<double>: 61
    eval<int>   : 61
    ----------------------
    parse success
    input       : (avg   1 + (* 4 5 6) / 2)
    ast parsed  : (avg (+ 1 (/ (* 4 5 6) 2)))
    eval<double>: 0
    eval<int>   : 0
    ----------------------
    parse success
    input       : (trace 1 + (* 4 5 6) / 2 1 1 1 1 999)
    ast parsed  : (trace (+ 1 (/ (* 4 5 6) 2)) 1 1 1 1 999)
    trace( 61 1 1 1 1 999 )
    eval<double>: 0
    trace( 61 1 1 1 1 999 )
    eval<int>   : 0
    ----------------------
    parse success
    input       : (avg   1 + (* 4 5 6) / 2 1 1 1 1 999)
    ast parsed  : (avg (+ 1 (/ (* 4 5 6) 2)) 1 1 1 1 999)
    eval<double>: 167.167
    eval<int>   : 167
    ----------------------
    parse success
    input       : (avg   1 + (* 4 (unknown-function 5) 6) / 2 a b c)
    ast parsed  : (avg (+ 1 (/ (* 4 (unknown-function 5) 6) 2)) a b c)
    error: can't apply 'unknown-function' to 1 args (std::exception)
    

    1 bear in mind, there is no type system, which is why you have to specify the hardcoded datatype to interpret all values as (e.g. double result = eval<double>(ast)). There is also no symbol table, so only builtin functions +-/* trace and avg exist