Search code examples
grammarbisonbnfjison

Is this grammar ambiguous?


I'm trying to define a language using Jison with very little punctuation for delimitation - like CoffeeScript but without the indentation. This is sort of what I want to achieve:

# Definition
object1, object2
    property1 = value1,
    property2 = value2

# Definition
object3 property = value

# Statement
object1 + object2 + object3

This defines three objects with some properties and adds them. Notice that the first definition specifies two objects using a list of names, and the second demonstrates that whitespace should not be significant.

I feel like the grammar is not ambiguous despite the lack of start and end symbols between the list of names and the list of properties. The grammar specifies that every name list is followed by a property list. This all seems to work fine if I write a grammar just specifying the definitions, including productions like:

definition
    : name_list property_list
    ;

name_list
    : name
    | name_list ',' name
    ;

property_list
    : property
    | property_list ',' property
    ;

property
    : name '=' name
    ;

Now, I add rules for the expression part of the grammar, in a fairly normal way as far as I can tell:

expr
    : expr '+' expr
    | expr '/' expr
    | name
    ;

Jison complains that 'multiple actions possible' for a bunch of different lookahead tokens in some numbered state. The reduction options usually look like:

- reduce by rule: name_list -> name
- reduce by rule: expr -> name

I believe the grammar to be unambiguous, but how can I convince Jison of this? It seems it might need to look ahead two tokens instead of one, but that's a blind guess, and the Jison documentation notes that it does not (yet?) support LL(k) grammars.


Solution

  • You don't show the whole grammar, but it looks like your problem is that it can't tell the difference between an expr that is a simple name and the beginning of a declaration that has a single name in the name list. Consider the inputs

    A B = C
    

    and

    A B C = D
    

    the first case is a single definition of A with one property, while the second is an expression A followed by a definition for B.

    The problem is that the parser needs to decide between these cases after having seen A and looking at a lookahead of B, but it can't -- it needs more lookahead (to see what is after the B)

    There are a number of things that you can do to avoid this by either changing your language or getting (effectively) extra lookahead.

    1. Changing the language. It might be the case that a statement that is just a single name doesn't make any sense. So you could change the language to have a separate statement rule that disallows simple names:

      statement: expr '+' expr | expr '/' expr ;
      expr: statement | name ;
      

      now it can distinguish between a statement and a declaration without needing extra lookahead, as a statement must contain an operator.

    2. Change the tool. You can use bison's %glr-parser option or a tool like btyacc that can deal with non-LALR(1) grammars. I'm not at all sure what Jison supports, however.

    3. Simulate extra lookahead in the lexer. You can have your lexer do the extra lookahead for you. You could have a lexer pattern that matches [a-zA-Z]+[ \t\n]*= (that is, a name followed by an = sign) and have that return a special propname token instead of name. Then your property rule becomes:

      property: propname name ;