Search code examples
cparsingyacclalr

yacc parser reduces before left recursion suffix


I wrote a pretty simple left-recursive grammar in yacc, based on data types for a simple C-like language. The generated yacc parser reduces before the left recursion suffix, and I have no clue why.

Here is the source code:

%%
start: type {
        printf("Reduced to start.\n");
    };

type: pointer {
        printf("Reduced pointer to type.\n");
    } 
    | char {
        printf("Reduced char to type.\n");
    };

char: KW_CHAR {
        printf("Reduced to char.\n");
    };

pointer: type ASTERISK {
        printf("Reduced to pointer.\n");
    };
%%

Given the input char * (KW_CHAR ASTERISK):

Reduced to char.
Reduced char to type.
syntax error

Solution

  • You don't seem to be asking about the parse error you're receiving (which, as noted in the comments, is likely an issue with the token types being returned by yylex()), but rather why the parser reductions are performed in the order shown by your trace. That's the question I will try to address here, even though it's quite possible that it's an XY problem, because understanding reduction order is important.

    In this case, the reduction order is pretty simple. If you have a production:

    pointer: type ASTERISK 
    

    then type must be reduced before ASTERISK is shifted. In an LR parser, reductions must be done when the right-hand side to be reduced ends exactly at the last consumed input token. (The lookahead token has been identified by the lexical scanner but not yet consumed by the parser. So it's not part of the reduction and can be used also to identify other reductions ending at the same point.)

    I hope it's more evident why with the production

    type: char
    

    char needs to be reduced before type. Until char is reduced, it's not available to be used in the reduction of type.

    Really, these two examples show the same behaviour. Reductions are performed left-to-right, and from the bottom up (that is, children first). Hence the name for this kind of parsing.

    So the reduction order shown by your parser (first char, then type, and only after the * is shifted, pointer) is precisely what would be expected.