Search code examples
grammarbisonlalr

Ambiguous grammar (using Bison)


I've got a problem with an ambiguous grammar. I've got this:

%token identifier
%token lolcakes

%start program

%%

program
    : call_or_definitions;

expression
    : identifier
    | lolcakes;

expressions
    : expression
    | expressions ',' expression;

call_or_definition
    : function_call
    | function_definition;

call_or_definitions
    : call_or_definition
    | call_or_definitions call_or_definition;

function_argument_core
    : identifier
    | identifier '=' expression
    | identifier '=' '{' expressions '}';

function_call
    : expression '(' function_arguments ')' ';';

function_definition
    : identifier '(' function_definition_arguments ')' '{' '}';

function_argument
    : lolcakes
    | function_argument_core;

function_arguments
    : function_argument 
    | function_arguments ',' function_argument

function_definition_argument
    : expression function_argument_core
    | function_argument_core;

function_definition_arguments
    : function_definition_argument
    | function_definition_arguments ',' function_definition_argument;

It's a subset of my genuine grammar which is separately compilable. At the moment, it generates an S/R conflict between function_call and function_definition when encountering the stream identifier (. I'm trying to convince Bison that it doesn't need to make the decision until later in the token stream by unifying the grammar for function calls and function definitions. In other words, if it encounters something that's common to both calls and definitions, it can reduce that without needing to know which is which, and if it encounters something else, that something else would clearly label which is which. Is that even possible, and if so, how can I do it? I'd really rather avoid having to alter the structure of the input stream if possible.


Solution

  • The problem should not arise until you see identifier ( identifier with a lookahead of , or ). At that point the parser has to decide whether to reduce the second identifier as a function_definition_argument or an expression (to become function_argument).

    You can solve this purely in the grammar by brute force, but it will lead you into a maze of nonterminals like expression_not_naked_identifier and ambiguous_begining_of_function_defn_or_call, with resulting rampant duplication of semantic actions.

    It would probably be more maintainable (and lead to more intelligible syntax error messages) to write something like

    definition_or_call_start: identifier '(' generic_argument_list ')'
    generic_argument_list: generic_argument
                         | generic_argument_list ',' generic_argument
    generic_argument: expression 
                    | function_argument_core
                    | ...
    function_call: definition_or_call_start ';';
    function_definition : definition_or_call_start '{' '}';
    

    and then check as a semantic constraint in the action for the last two productions that the actual generic_arguments you have parsed match the use they're being put to.