Search code examples
parsingparsimonious

Why doesn't parsimonious parse this?


I seem to be completely stuck with understanding why this is failing to parse. Following is my simple grammar (just playing around trying to understand parsimonious and hence the grammar may not make sense).

from parsimonious.grammar import Grammar
from parsimonious.nodes import NodeVisitor

sql_grammar = Grammar(
    """
    select_statement     = "SELECT" ("ALL" / "DISTINCT")? object_alias_section
    object_alias_section = object_name / alias
    object_name          = ~"[ 0-9]*"
    alias                = ~"[ A-Z]*"
    """
)


data = """SELECT A"""


tree = sql_grammar.parse(data)
print("tree:", tree, "\n")

A SELECT 10 parses but for some reason, a SELECT A fails to parse. My understanding is either of object_name or alias should be present. What am i doing wrong? Thanks in advance.


Solution

  • There are two problems with your grammer:

    1. Parsimonious doesn't handle whitespace automaticaly, you must take care of them (some ideas can be derived from https://github.com/erikrose/parsimonious/blob/master/parsimonious/grammar.py#L224)

    2. As stated in README.md / operator match the first matching alternatives, so it try to match object_name first. Because there is hanging unparsed space, it is match by object_name and parsing finish. But even if the space would be correctly handled, object_name would match empty string and parsing also would finish with error.

    To fix you grammar, I suggest change it as follow:

    sql_grammar = Grammar(
        """
        select_statement     = "SELECT" (ws ("ALL" / "DISTINCT"))? ws object_alias_section
        object_alias_section = object_name / alias
        object_name          = ~"[ 0-9]+"
        alias                = ~"[ A-Z]+"
        ws                   = ~"\s+"
        """
    )
    

    and everything should parse correctly.