Search code examples
antlr4grammar

Is this just a flawed grammar?


I was looking through a grammar for focal and found someone had defined their numbers as follows:

 number
   : mantissa ('e' signed_)?
   ;

mantissa
   : signed_
   | (signed_ '.')
   | ('.' signed_)
   | (signed_ '.' signed_)
   ;

signed_
   : PLUSMIN? INTEGER
   ;

PLUSMIN
   : '+'
   | '-'
   ;

I was curious because I thought this would mean that, for example, 1.-1 would get identified as a number by the grammar rather than subtraction. Would a branch with unsigned_ be worth it to prevent this issue? I guess this is more of a question for the author, but are there any benefits to structuring it this way (besides the obvious avoiding floats vs ints)?


Solution

  • It’s not necessarily flawed.

    It does appear that it will recognize 1.-1 as a mantissa. However, that doesn’t mean that some post-parse validation doesn’t catch this problem.

    It would be flawed if there’s an alternative, valid interpretation of 1.-1.

    Sometimes, it’s just useful to recognize an invalid construct and produce a parse tree for “the only way to interpret this input”, and then you can detect it in a listener and give the user an error message that might be more meaningful than the default message that ANTLR would produce.

    And, then again, it could also just be an oversight.

    The `signed_` rule on the other hand, being:
    signed_ : PLUSMIN? INTEGER;
    

    Instead of

    signed_ : PLUSMIN? INTEGER+;
    

    does make this grammar somewhat suspect as a good example to work from.