I would like to know, how expressions are parsed when are mixed with control flow.
Let's assume such syntax:
case
when a == Method() + 1
then Something(1)
when a == Other() - 2
then 1
else 0
end
We've got here two conditional expressions, Method() + 1
, Something(1)
and 0
. Each can be translated to postfix by Shunting-yard algorithm
and then easily translated into AST
. But is it possible to extend this algorithm to handle control - flow also? Or are there other approaches to solve such mixing of expressions and control flows?
another example:
a == b ? 1 : 2
also how can I classify such expression: a between b and c
, can I say that between
is three arguments function? Or is there any special name for such expressions?
You can certainly parse the ternary operator with an operator-precedence grammar. In
expr ? expr : expr
the binary "operator" here is ? expr :
, which conveniently starts and ends with an operator token (albeit different ones). To adapt shunting yard to that, assign the right-precedence of ? and the left-precedence of : to the precedence of the ?:
operator. The left-precedence of ? and the right-precedence of : are ±∞, just like parentheses (which, in effect, they are).
Since the case
statement is basically repeated application of the ternary operator, using slightly different spellings for the tokens, and yields to a similar solution. (Here case when
and end
are purely parenthetic, while then
and the remaining when
s correspond to ?
and :
.)
Having said that, it really is simpler to use an LALR(1) parser generator, and there is almost certainly one available for whatever language you are writing in.
It's clear that both the ternary operator and OP's case statement are operator grammars:
Ternary operator:
ternary-expr: non-ternary-expr
| non-ternary-expr '?' expr ':' ternary-expr
Normally, the ternary operator will be lower precedence from any other operator and associate to the right, which is how the above is written. In C and other languages ternary expressions have the same precedence as assignment expressions, which is straightforward to add. That results in the relationships
X ·> ?
? <· X
? ·=· :
X ·> :
: <· X
Case statement (one of many possible formulations):
case_statement: 'case' case_body 'else' expr 'end'
case_body: 'when' expr 'then' expr
| case_body 'when' expr 'then' expr
Here are the precedence relationships for the above grammar:
case <· when
case ·=· else
when <· X
(see below)when ·=· then
then ·> when
then ·> else
else <· X
else ·=· end
X ·> then
X ·> when
X ·> end
X
in the above relations refers to any binary or unary operator, any value terminal, (
and )
.
It's straightforward to find left- and right-precedence functions for all of those terminals; the pattern will be similar to that of parentheses in a standard algebraic grammar.