Search code examples
parsinggrammarbisonbnf

Is left-factoring of grammar necessary in Bison?


I am making a parser using bison. I just wanna ask if it still necessary for a grammar to be left-factored when used in bison. I tried giving bison a non-left-factored grammar and it didn't gave any warning or error and it also accepted the example syntax I gave to the parser, but I'm worried that it the parser may not be accurate in every input.


Solution

  • Left factoring is how you remove LL-conflicts in a grammar. Since Bison uses LALR it has no problems with left recursion or any other LL-conflicts (indeed, left recursion is preferable as it minimizes stack requirements), so left factoring is neither necessary nor desirable.

    Note that left factoring won't break anything -- bison can deal with a left-factored grammar as well as a non-left factored one, but it may require more resources (memory) to parse the left-factored grammar, so in general, don't.

    edit

    You seem to be confused about how LL-vs-LR parsing work and how the structure of the grammar affects each.

    LL parsing is top down -- you start with just the start symbol on the parse stack, and at each step, you replace the non-terminal on top of the stack with the symbols from the right side of some rule for that non-terminal. When there is a terminal on top of the stack, it must match the next token of input, so you pop it and consume the input. The goal being to consume all the input and end up with an empty stack.

    LR parsing is bottom up -- you start with an empty stack, and at each step you either copy a token from the input to the stack (consuming it), or you replace a sequence of symbols on the top of the stack corresponding to the right side of some rule with the single symbol from the left side of the rule. The goal being to consume all the input and be left with just the start symbol on the stack.

    So different rules for the same non-terminal which start with the same symbols on the right side are a big problem for LL parsing -- you could replace that non-terminal with the symbols from either rule and match the next few tokens of input, so you would need more lookahead to know which to do. But for LR parsing, there's no problem -- you just shift (move) the tokens from the input to the stack and when you get to the later tokens you decide which right side it matches.

    LR parsing tends to have problems with rules that end with the same tokens on the right hand side, rather than rules that start with the same tokens. In your example from John Levine's book, there are rules "cart_animal ::= HORSE" and "work_animal ::= HORSE", so after shifting a HORSE symbol, it could be reduced (replace by) either "cart_animal" or "work_animal". Since the context allows either to be followed by the "AND" token, you end up with a reduce/reduce (LR) conflict when the next token is "AND".