Search code examples
parsinginterpreterlalrlemon

Using midaction rules in Lemon to interpret "let" expression


I'm trying to write a "toy" interpreter using Flex + Lemon that supports a very basic "let" syntax where a variable X is temporarily bound to an expression. For example, "letx 3 + 4 in x + 8" should evaluate to 15.

In essence, what I'd "like" the rule to say is:

expr(E) ::= LETX expr(N) IN expr(O). {
                                         environment->X = N;  
                                         E = O; 
                                     }

But that won't work since O is evaluated before the X = N assignment is made.

I understand that the usual solution for this would be a mid-rule action. Lemon doesn't explicitly support this, but I've read elsewhere that would just be syntactic sugar in any event.

So I've tried to put together a mid-rule action that would do my assignment of X = N before interpreting O:

midruleaction ::=  /* mid rule */.                   { environment->X = N; }    
expr(E)  ::= LETX expr(N) IN midruleaction expr(O).  {  E = O; }

But that won't work because there's no way for the midruleaction rule to access N, or at least none I can see in the lemon docs/examples.

I think I'm missing something here. I know I could build up a tree and then walk it in a second pass. And I might end up doing that, but I'd like to understand how to solve this more directly first.

Any suggestions?


Solution

  • It's really not a very scalable solution to evaluate immediately in a parser. See below.

    It is true that mid-rule actions are (mostly) syntactic sugar. However, in most cases they are not syntactic sugar for "markers" (non-terminals with empty right-hand sides) but rather for non-terminals representing production prefixes. For example, you could write your letx rule like this:

    expr(E)     ::= letx_prefix IN expr(O). { E = O; }
    letx_prefix ::= LETX expr(N).           { environment->X = N; }  
    

    Or you could do this:

    expr(E)       ::= LETX assigned_expr IN expr(O). { E = O; }
    assigned_expr ::= expr(N).                       { environment->X = N; }
    

    The first one is the prefix desugaring; the second one is the one I'd use because I feel that it separates concerns better. The important point is that the environment->X = N; action requires access to the semantic values of the prefix of the RHS, so it must be part of a prefix rule (which includes at least the symbols whose semantic values are requires), rather than a marker, which has access to no semantic values at all.

    Having said all that, immediate evaluation during parsing is a very limited strategy. It cannot cope with a large range of constructs which require deferred evaluation, such as loops and function definitions. It cannot cleanly cope with constructs which may suppress evaluation, such as conditionals and short-circuit operators. (These can be handled using MRAs and a stateful environment which contains an evaluation-suppressed flag, but that's very ugly.)

    Another problem is that syntactically incorrect expressions may be partially evaluated before the syntax error is discovered, and it may not be immediately obvious to the user which parts of the expression have and have not been evaluated.

    On the whole, you're better off building an easily-evaluated AST during the parse, and evaluating the AST when the parse successfully completes.