Search code examples
parsinggrammarbisonambiguityglr

Why does this very simple grammar cause GLR parsers to choke?


I've tried several different parser generators (Bison, DParser, etc.) that claim to be able to generate GLR parsers i.e., ones that can handle ambiguous grammars. Here is a very simple ambiguous grammar of the type I'm talking about:

START: A | B;
A: C | D;
B: C | D;
C: T1 | T2;
D: T3 | T4;
T1: 't1';
T2: 't2';
T3: 't3';
T4: 't4';

I can generate the parsers just fine, but I get "unresolved ambiguity" errors or just outright crashes when I give the parser input that should be valid. There are no problems of any kind when I change the grammar to an unambiguous version.

What am I not understanding about GLR parsers? I thought the whole point was that in cases of ambiguity, ALL possible parses are tracked up until they merge or reach a dead end. All I need is a parser that can tell me whether there is ANY valid parse of the input.

Thanks for any help.

edit:

This is frustrating. Using %dprec and %merge I've been able to get Bison to handle ambiguous rules and terminals, but it still chokes on very simple but highly pathological pseudo-English grammars of the kind that I need to handle:

S: NP VP | NPA VP;
NPA: D N | NP PP;
NP: D N | NP PP | NPA;
VP: V NP | VP PP;
PP: P NP;
D: "the" | "a";
P: "in" | "with";
V: "saw";
N: "saw" | "girl" | "boy" | "park" | "telescope";

With input "a boy saw a girl", Bison is unable to parse and returns with code 1. Tom on the other hand, deals with this grammar and this input sentence flawlessly, and even naturally handles unknown terminals by just assigning them to all possible terminal types. But unlike Bison, Tom chokes on large grammars. (By "chokes" I mean fails in various different ways. If the failure modes would be helpful, I can report those.)

Anyone have any other ideas?


Solution

  • Unfortunately, bison really insists on producing a (single) parse, so you have to specify some way to merge ambiguous parses. If you don't, and there is more than one possible parse, bison's GLR parser will indeed complain that the parse is ambiguous.

    If you don't really care which of the multiple parses is accepted, then it's not too difficult to bend bison to your will. The simplest way is to just assign a different %dprec to every possibly ambiguous production. Bison will then select whichever applicable production happens to have the best precedence.

    You can even get bison to tell you about multiple parses with a simple %merge function; there is an example in the bison manual. (The documentation of this feature isn't great but it might be adequate to your needs. If not, feel free to ask a more specific question.)

    I don't have much experience with DParser, but the manual indicates that its behaviour when faced with multiple possible parses is similar: the default is to complain, but you can provide a trivial merge function of your own: (The quote comes from Section 12, Ambiguities)

    Ambiguities are resolved automatically based on priorities and associativities. In addition, when the other resolution techniques fail, user defined ambiguity resolution is possible. The default ambiguity handler produces a fatal error on an unresolved ambiguity. This behavior can be replaced with a user defined resolvers the signature of which is provided in dparse.h.


    Here's an example bison GLR grammar for the second example. I left out the lexer, which is really not relevant (and slightly embarrassing because I was rushing).

    %{
      int yylex();
      void yyerror(const char* msg);
    %}
    
    %error-verbose
    %glr-parser
    
    %token WORD_A "a"
    %token WORD_BOY "boy"
    %token WORD_GIRL "girl"
    %token WORD_IN "in"
    %token WORD_LIKED "liked"
    %token WORD_PARK "park"
    %token WORD_SAW "saw"
    %token WORD_TELESCOPE "telescope"
    %token WORD_THE "the"
    %token WORD_WITH "with"
    
    %%
    S  : NP VP      {puts("S: NP VP");}     %dprec 1
       | NPA VP     {puts("S: NPA VP");}    %dprec 2
       ;
    NPA: D N        {puts("NPA: D N");}     %dprec 3
       | NP PP      {puts("NPA: NP PP");}   %dprec 4
       ;
    NP : D N        {puts("NP: D N");}      %dprec 6
       | NP PP      {puts("NP: NP PP");}    %dprec 7
       | NPA        {puts("NP: NPA");}      %dprec 10
       ;
    VP : V NP       {puts("VP: V NP ");}    %dprec 11
       | VP PP      {puts("VP: VP PP");}    %dprec 12
       ;
    PP : P NP       {puts("PP: P NP");}     %dprec 14
       ;
    D  : "the"      {puts("D: the");}       %dprec 15
       | "a"        {puts("D: a");}         %dprec 16
       ;
    P  : "in"       {puts("P: in");}        %dprec 17
       | "with"     {puts("P: with");}      %dprec 18
       ;
    V  : "liked"    {puts("V: liked");}     %dprec 19
       | "saw"      {puts("V: saw");}       %dprec 20
       ;
    N  : "girl"     {puts("N: girl");}      %dprec 21
       | "boy"      {puts("N: boy");}       %dprec 22
       | "park"     {puts("N: park");}      %dprec 23
       | "saw"      {puts("N: saw");}       %dprec 24
       | "telescope"{puts("N: telescope");} %dprec 25
       ;
    %%
    
    int main(int argc, char** argv) {
      printf("yyparse returned %d\n", yyparse());
      return 0;
    }
    

    Compilation:

    $ make ambig2
    bison30 -v -d -o ambig2.c ambig2.y 
    ambig2.y: warning: 6 shift/reduce conflicts [-Wconflicts-sr]
    ambig2.y: warning: 10 reduce/reduce conflicts [-Wconflicts-rr]
    gcc-4.8 -ggdb -Wall -D_POSIX_C_SOURCE=200809L -std=c99 -c -o ambig2.o ambig2.c
    gcc-4.8   ambig2.o   -o ambig2
    rm ambig2.o ambig2.c
    

    Sample parses:

    $ ./ambig2 <<<"a boy saw a girl"
    D: a
    N: boy
    NPA: D N
    V: saw
    D: a
    N: girl
    NPA: D N
    NP: NPA
    VP: V NP 
    S: NPA VP
    yyparse returned 0
    
    $ ./ambig2 <<<"a saw saw the saw in a saw"
    D: a
    N: saw
    NPA: D N
    V: saw
    D: the
    N: saw
    NPA: D N
    NP: NPA
    VP: V NP 
    P: in
    D: a
    N: saw
    NPA: D N
    NP: NPA
    PP: P NP
    VP: VP PP
    S: NPA VP
    yyparse returned 0