Search code examples
sqlparsinglemon

"Partial parsing" with lemon


I have an SQL grammar built using the lemon parser generator. The normal entry point to parsing a command is a statement (like SELECT ...), so the statement is my %start non-terminal in the grammar. So far everything works fine.

Now I want to do a "partial parse", e.g. only parse an expression, or a WHERE clause. Basically this means I want the %start non-terminal to change at runtime. I couldn't find anything on that in the documentation. Is this possible in lemon?

If not I was thinking of doing something like letting the parse fail at my custom starting point. This feels like quite a hack, is there a cleaner way?


Solution

  • A standard LALR parser generator (like LEMON) won't (in fact can't) let you do what you want.

    To make this work, you have to create a start state which is willing to accept any of the nonterminals of interest to you (e.g., SELECT_clause, WHERE_clause). You do this by effectively changing the grammar from:

       GOAL = TOP ;
       RULEn = ... ;
       SG = .... ;
    

    where SG is the secondary goal (eg., your WHERE clause) into

       GOAL = TOP | SG ;
       RULEn = ... ;
       SG = ... ;
    

    The bad news when you do this is that the conditions acceptable for LALR state generation are usually violated (e.g., the lookahead sets no longer distinguish the reductions), and now your parser generator no longer works.

    This is a lot easier to to with a GLR parser generator, which can handle any context-free grammar. (We in fact do exactly this for patterns defined in terms of nonterminals for our DMS Software Reengineering Toolkit. In fact, we go a little over the top and put every nonterminal in the goal production. While this sounds crazy, it allows us to recognize any well-formed (nonterminal) phrase from the language ).