Search code examples
grammaryacclex

How to implement the grammar for this exercise in lex & yacc


Exercise

I'm trying to implement the grammar of this exercise using lex & yacc, but it throws this conflict: warning: 5 shift/reduce conflicts [-Wconflicts-sr] and takes every string as invalid, I think it has to be something with the precedence: It is supposed to accept strings like: ba#ababadada#bad#dabbada and reject: dad#ad#abaadad#badadbaad This is what I've tried:

%{
#include <stdio.h>
int flag=0;
%}
%{
void yyerror();
%}

%token A B D NL
%%
str1        :   Oracion NL      { printf("\n Sequence accepted");return(0);}
        ;
Oracion     :   Oracion '#' Parrafo     { }
        |   Parrafo         { }
        ;
Parrafo     :   Silaba Parrafo Silaba           { }
        |   Silaba              { }
        ;
Silaba      :   Explosion Alto      { }
        |   A Explosion             { }
        |   A Alto          { }
        |   Explosion           { }
        ;
Explosion   :   Alto A              { }
        ;
Alto        :   B               { }
        |   D               { }
        ;   
%%
void main()
{
    printf("\n write a sequence\n");
    yyparse();
    if(flag == 0)
        printf("\n Valid sequence");
}
void yyerror()
{
    printf("\n Invalid Sequence\n\n");
    flag=1;
}

Solution

  • That grammar allows four syllables (ignoring the non-syntactic distinction between b and d):

    ba
    bab
    ab
    aba
    

    A word is a sequence containing an odd number of syllables. Since syllable boundaries are not marked in any way, the division into syllables is ambiguous, which is why bison reports conflicts. For example, babab could be parsed as

    ba-bab  (Explosión / Explosión Alto)
    bab-ab  (Explosión Alto / "a" Alto)
    

    In a few cases, the fact that words must consist of an odd number of syllables can disambiguate a parse. For example, bababa can only be decomposed as ba-ba-ba (Explosión / Explosión / Explosión), because bab-aba (Explosión Alto / "a" Explosión) has only two syllables, an even number. In particular, this means that the greedy solution -- always using the longest possible syllable -- will sometimes produce an incorrect parse. Moreover, you cannot guarantee the correct parse without arbitrary lookahead, making an LR parser inappropriate. Consider the following two words:

    abababbabbabbabbabbabbab
    abababbabbabbabbabbabbabbab
    

    A double consonant is an unambiguous syllable break, so the only ambiguous break is in the first six characters, which -- as above -- could be two or three syllables. Looking at the entire word, it will be clear which of these choices leads to a word with an odd number of syllables. But until the entire word has been seen, the parser cannot know. In particular, after it sees the initial ab, it must decide whether to resolve that as a syllable of the form a Alto or shift the following a in order to resolve it as a Explosión. But any fixed lookahead limit won't be sufficient to make this decision in all cases.

    I suppose that the exercise is intended to help you understand this ambiguity (unless there is more to the exercise than what you showed in the question). I'm not sure what else you're being asked to do. If you want to use bison in order to actually produce parse trees, you'll have to deal with the ambiguity in some way, which probably means using a GLR parser.