This is a question about extending a very basic Shunting Yard expression parser.
I have a language with expressions consisting of numeric literals, alphanumeric variable names, operators "+", "*", and "-" with some arity, associativity and priority, and parentheses. But there are other statements in the language and the language has no line termination character.
What's the correct way to stop parsing or signal an error when the arity of operators is known?
Examples:
10 ==> 10
10+20*30 ==> 10 20 30 * +
(10+20)*30 ==> 10 20 + 30 *
a+b ==> a b +
10 a ==> 10 ; but leave "a" unparsed
10+a 30 ==> 10 a + ; but leave "30" unparsed
10+20* ==> error "missing argument for *"
The first four cases already work, now what about the last three?
It should be evident from the examples that the expression is terminated when you see two consecutive operands. (Assuming there are no parentheses on the stack. If you find two consecutive operands inside parentheses, you have a syntax error.)
An operand, here, is:
An identifier
A literal constant
An open parenthesis (
In general, when you are dividing an algebraic expression into tokens, you can be in one of two states'
Expecting an operand.
Expecting an operator.
These states alternate, except for parentheses. After you see an operand, you need an operator; after you see an operator, you need an operand. However, parentheses do not change the state. You can only see a ( when expecting an operand; afterwards, you are still expecting an operand. Similarly, you can only see a ) when you are using expecting an operator and afterwards you are still expecting an operator.
This simple state machine also lets you handle unary operators: a - is a binary operator if you are expecting an operator and a unary operator if you are expecting an operand.
The state machine starts in the "expecting an operand" state, and can only accept end of input in the "expecting an operator" state.