I know that exponentiation has higher precedence that the unary minus. However if I build an expression parser based on that I still can’t parse expressions like 2—-3. In order to deal with these I’ve found I also need to add unary minus handling to the factor production rule that is one precedence higher than exponentiation. Is this how the unary minus and exponetiation is usually dealt with? I’ve not found anything online or in books that talks about this particular situation. I was wondering whether making exponentiation and unary operators having equal precedence you help?
I'm hand crafting a recursive descent parser, I tried merging the power and unary production rules together but it didn't seem to work. What does work is the following EBNF
factor = '(' expression ')' | variable | number | '-' factor
power = factor { '^' factor }
unaryTerm = ['-' | '+'] power
term = unaryTerm { factorOp unaryTerm }
expression = term { termOp term }
termOp = '+' | '-'
factorOp = '*' | '/'
Unless you have unusual requirements, putting both unary minus and exponentiation in the same non-terminal will work fine, because exponentiation is right-associative: (Yacc/bison syntax)
atom: ID
| '(' expr ')'
factor
: atom
| '-' factor
| atom '^' factor
term: factor
| term '*' factor
expr: term
| expr '+' term
| expr '-' term
Indeed, exponentiation being right-associative is virtually required for this syntax to be meaningful. Consider the alternative, with a left-associative operator.
Let's say we have two operators, ⊕ and ≀, with ⊕ being left associative and binding more tightly than ≀, so that ≀ a ⊕ b
is ≀(a ⊕ b)
.
Since ⊕ is left associative, we would expect a ⊕ b ⊕ c
to be parsed as (a ⊕ b) ⊕ c
. But then we get an oddity. Is a ⊕ ≀ b ⊕ c
the same as (a ⊕ ≀b) ⊕ c)
or the same as a ⊕ ≀(b ⊕ c))
? Both options seem to violate the simple patterns. [Note 1]
Certainly, an unambiguous grammar could be written for each case, but which one would be less surprising to a programmer who was just going by the precedence chart? The most likely result would be a style requirement that ≀ expressions always be fully parenthesized, even if the parentheses are redundant. (C style guides are full of such recommendations, and many compilers will chide you for using correct but "unintuitive" expressions.)
a ⊕ ≀(b ⊕ c))
, which might or might not be intuitive, depending on your intuitions.