Search code examples
algorithmparsingcompiler-constructionabstract-syntax-tree

Mixing mathematical expression with control flow


I would like to know, how expressions are parsed when are mixed with control flow.

Let's assume such syntax:

case 
  when a == Method() + 1 
    then Something(1) 
  when a == Other() - 2
    then 1
  else 0
end

We've got here two conditional expressions, Method() + 1, Something(1) and 0. Each can be translated to postfix by Shunting-yard algorithm and then easily translated into AST. But is it possible to extend this algorithm to handle control - flow also? Or are there other approaches to solve such mixing of expressions and control flows?

another example:

a == b ? 1 : 2  

also how can I classify such expression: a between b and c, can I say that between is three arguments function? Or is there any special name for such expressions?


Solution

  • You can certainly parse the ternary operator with an operator-precedence grammar. In

    expr ? expr : expr
    

    the binary "operator" here is ? expr :, which conveniently starts and ends with an operator token (albeit different ones). To adapt shunting yard to that, assign the right-precedence of ? and the left-precedence of : to the precedence of the ?: operator. The left-precedence of ? and the right-precedence of : are ±∞, just like parentheses (which, in effect, they are).

    Since the case statement is basically repeated application of the ternary operator, using slightly different spellings for the tokens, and yields to a similar solution. (Here case when and end are purely parenthetic, while then and the remaining whens correspond to ? and :.)

    Having said that, it really is simpler to use an LALR(1) parser generator, and there is almost certainly one available for whatever language you are writing in.


    It's clear that both the ternary operator and OP's case statement are operator grammars:

    1. Ternary operator:

      ternary-expr: non-ternary-expr
                  | non-ternary-expr '?' expr ':' ternary-expr
      

      Normally, the ternary operator will be lower precedence from any other operator and associate to the right, which is how the above is written. In C and other languages ternary expressions have the same precedence as assignment expressions, which is straightforward to add. That results in the relationships

      • X ·> ?
      • ? <· X
      • ? ·=· :
      • X ·> :
      • : <· X
    2. Case statement (one of many possible formulations):

      case_statement: 'case' case_body 'else' expr 'end'
      case_body: 'when' expr 'then' expr
               | case_body 'when' expr 'then' expr
      

      Here are the precedence relationships for the above grammar:

      • case <· when
      • case ·=· else
      • when <· X (see below)
      • when ·=· then
      • then ·> when
      • then ·> else
      • else <· X
      • else ·=· end
      • X ·> then
      • X ·> when
      • X ·> end

      X in the above relations refers to any binary or unary operator, any value terminal, ( and ).

      It's straightforward to find left- and right-precedence functions for all of those terminals; the pattern will be similar to that of parentheses in a standard algebraic grammar.