Search code examples
bisonreduce-reduce-conflict

Reduce/reduce conflict in a C-style cast operator


I am writing a bison+flex parser for a certain query language and I need to add a C-style cast operator to it. Here is the relevant part of the code:

%token <characterToken>  Identifier
%token <commandToken>    LRPAR RRPAR

%type <characterToken>   typename
%type <operationValue>   generalExp castExp variable

%%
generalExp: variable
      | LRPAR generalExp RRPAR
       { /* some code here */ }
      | castExp
      ;

castExp: LRPAR typename RRPAR generalExp
   { /* some code here */ }
   ;

variable: Identifier 
    { /* some code here*/ };
typename: Identifier;
%%

The problem here is that (typename) in castExp clashes with (variable) in generalExp and gives a reduce/reduce conflict:

test.yy: conflicts: 1 reduce/reduce
test.yy:23.11-20: warning: rule useless in parser due to conflicts: typename: Identifier

Possible solution could be to list all valid typenames (like long, int, char) in the corresponding lex file, however I need to support used-defined types as well.

UPD: another solution is to use bison GLR-parser, which I don't want due to performance drop.

bison -v output is here.


Solution

  • Yes, the C cast syntax was very badly chosen. It is impossible to resolve correctly without knowing whether what is inside the cast is a type or a value. Here are some cases in C:

    (f)*a    // cast the value pointed to by a to type f, or multiply f by a?
    (f)(a+b) // cast a+b to type f or call the function f with argument a+b?
    

    The easiest way I've found to handle this is to let the lexer consult the symbol table, so that it can return different token types for Typenames and Identifiers. (Which is pretty well the way the C grammar works.) Sharing the symbol table between the lexer and the parser is a bit ugly, particularly if you allow scoped type definitions, but it can be done. For the purposes of correct lexing, the only thing that the lexer should need to know is whether the symbol is the name of a type or not; you can probably get away with assuming that a symbol is not a type unless it has been explicitly declared a type, so that it is not strictly necessary to insist that all symbols be declared before use. (In C++, for example, a non-type class member does not have to be declared before use, and that can still work.)

    Having said all that, I'd strongly encourage you to think seriously about using a different syntax for casts, if at all possible. Aside from being a pain for parsers, they can also be annoyingly ambiguous for human readers (see the examples above), and casting is not (or should not be) so common that it requires an abbreviated syntax.

    C++-style casts -- cast<TYPE>(value) -- suffer from the ambiguity introduced by recycling a comparison operator as a bracket, but are tolerable if the operations using angle brackets are a fixed set of keywords. (Otherwise, you create another context where the lexer needs to distinguish between name kinds.)

    Personally, particularly for a query language, I'd go with something like

    value as type
    

    as in

    (3 + 4) as double
    

    which is (IMO) more readable and much easier to parse without being harder to type.