Search code examples
parsinggrammarcontext-free-grammarbnfc

Disallowing unnecessary outermost brackets in a BNFC-grammar


This is a continuation to this question I asked earlier about a BNFC-grammar for propositional logic. I got it working with parentheses, as per the definition, but I would now like to extend the grammar to work without parentheses, with a catch however: no unnecessary outer parentheses allowed.

For example, the atomic sentence a should be allowed, but (a) should not be recognized. The sentence (a => b) & c should also be allowed, but ((a => b) & c) not, and so forth. The last example highlights the necessity for paretheses. The precedence levels are

  1. equivalence <=> and implication =>,
  2. conjuction & and disjunction |
  3. negation - and
  4. atoms.

The higher the level, the earlier it will be parsed.

I got the grammar working with the unnecessary parentheses, by setting precedence levels to the different operators via recursion:

IFF     .   L     ::=   L   "<=>" L1  ;
IF      .   L     ::=   L   "=>"  L1  ;
AND     .   L1    ::=   L1  "&"   L2  ;
OR      .   L1    ::=   L1  "|"   L2  ;
NOT     .   L2    ::=       "-"   L3  ;
NOT2    .   L2    ::=       "-"   L2  ;
P       .   L3    ::=   Ident         ;

_       .   L     ::=   L1            ;
_       .   L1    ::=   L2            ;
_       .   L2    ::=   L3            ;
_       .   L3    ::=   "(" L ")"     ;

Now the question is, how do I not allow the outer parentheses, the allowance of which is caused by the last rule L3 ::= "(" L ")";? It is strictly necessary for allowing parentheses inside an expression, but it also allows them on the edges. I guess I need some extra rule for removing ambiguity, but what might that be like?

This grammar also results in about 6 reduce/reduce conflicts, but aren't those pretty much inevitable in recursive definitions?


Solution

  • You can do this by simply banning the parenthesised form from the toplevel. This requires writing the precedence hierarchy in a different fashion, in order to propagate the restriction through the hierarchy. In the following, the r suffix indicates that the production is "restricted" to not be a parenthesised form.

    I also fixed the reduce/reduce conflicts by eliminating one of the NOT productions. See below.

    (I hope I got the BNFC right. I wrote this in bison and tried to convert the syntax afterwards.)

    _       .   S     ::=   L0r             ;
    
    IFF     .   L0r   ::=   L0 "<=>" L1     ;
    IF      .   L0r   ::=   L0 "=>"  L1     ;
    
    AND     .   L1r   ::=   L1 "&"   L2     ;
    OR      .   L1r   ::=   L1 "|"   L2     ;
    
    NOT     .   L2r   ::=       "-"   L2    ;
    ID      .   L2r   ::=   Ident           ;                                            
    
    PAREN   .   L3    ::=   "(" L0 ")"      ;
    
    _       .   L0r   ::=   L1r             ;
    _       .   L1r   ::=   L2r             ;
    
    _       .   L0    ::=   L0r             ;
    _       .   L1    ::=   L1r             ;
    _       .   L2    ::=   L2r             ;
    
    _       .   L0    ::=   L3              ;
    _       .   L1    ::=   L3              ;
    _       .   L2    ::=   L3              ;
    

    (Edit: I changed the IFF, IF, AND and OR rules by removing the restriction (r) from the first arguments. That allows the rules to match expressions which start with a parenthesis without matching the PAREN syntax.)

    If you also wanted to disallow redundant internal parentheses (like ((a & b))), you could change the PAREN rule to

    PAREN   .   L3    ::=   "(" L0r ")"     ;                       
    

    which would make the L0 rule unnecessary.

    A variant approach which uses fewer unit productions can be found in the answer by @IraBaxter to Grammar for expressions which disallows outer parentheses.

    Side note:

    This grammar also results in about 6 reduce/reduce conflicts, but aren't those pretty much inevitable in recursive definitions?

    No, recursive grammars can and should be unambiguous. Reduce/reduce conflicts are not inevitable, and almost always indicate problems in the grammar. In this case, they are the result of the redundant productions for the unary NOT operator. Having two different non-terminals which can both accept "-" L3 is obviously going to lead to an ambiguity, and ambiguities always produce parsing conflicts.