Search code examples
bisonflex-lexershift-reduce-conflictambiguous-grammar

Bison how to describe optional syntax in grammar without shift-reduce conflicts?


I have a file that is described with a grammar. It has a section that can consist of one or two kinds of contents, and it can be in arbitrary order:

...
type_a_thing
type_b_thing
type_b_thing
type_a_thing
....

Or just

...
type_a_thing
...

or

...
type_b_thing
type_b_thing
...

or any combinations, in any number of occurences. Both type_a_thing andtype_b_thing has a well defined structure. I have managed to describe this so that the parser works, but I still get shift/reduce errors. I have uploaded a minimal example here:

https://github.com/waszil/minimal_bison_parser

Is this the right way to address this problem? Am I doing it wrong? I have tried a lot of things for this, checked the .output file generated by bison with the verbose flag, but I don't know, how should it be done properly. It is somewhat similar to the nested-list grammar problem that is described in the Flex&Bison O'Reilly book, but not the same

Thanks for any hints!


Solution

  • Look at this part of your grammar:

    contents:
        foobar
        | contents foobar
        ;
    foobar:
        foos
        | bars
        ;
    foos:
        foo
        | foos foo
        ;
    bars:
        bar
        | bars bar
    ;
    

    So contents is a list of foobars, and foobar is either a list of foos or a list of bars. This is ambiguous because an input consisting of two consecutive foos could be parsed as a contents by either interpreting the two foos as a single foobar containing two foos or as two foobars containing one foo each.

    An easy way to get rid of this ambiguity is forgo the inner lists:

    contents: foobar | contents foobar;
    
    foobar: foo | bar;
    

    If you need consecutive foos to be handled differently, you could still detect them during post-processing. If you absolutely need to handle this in the grammar, you can restructure the grammar so that a foos can only be followed by a bars (not another foos) or vice-versa.