Search code examples
parsingprologbacktrackingpegprolog-cut

Implementing "cut" in a recursive descent parser


I'm implementing a PEG parser generator in Python, and I've had success so far, except with the "cut" feature, of which whomever knows Prolog must know about.

The idea is that after a cut (!) symbol has been parsed, then no alternative options should be attempted at the same level.

expre = '(' ! list ')' | atom.

Means that after the ( is seen, the parsing must succeed, or fail without trying the second option.

I'm using Python's (very efficient) exception system to force backtracking, so I tried having a special FailedCut exception that would abort the enclosing choice, but that didn't work.

Any pointers to how this functionality is implemented in other parser generators would be helpful.

Maybe the problem I've had has been lack of locality. The code generated for the left part of the rule would be something like:

cut_seen = False
try:
    self.token('(')
    cut_seen = True 
    self.call('list')
    self.token(')')
except FailedParse as e:
    if cut_seen:
         raise FailedCut(e)
    raise

Then the code generated for the choice (|) operator will skip the following choices if it catches a FailedCut. What I mean by lack of locality is that the choice catching the FailedCut may be deep up in calls, thus having an effect too-difficult to discern.

Instead of making the code generated for sequences try to inform enclosing choices of cuts, I could make the code generated for choices beware of them. That would make the scope of cuts very local, unlike Prolog's, but good enough for what I want in a PEG parser, which is to commit to an option after a certain token sequence has been seen, so the error reporting is refers to that location in the source, instead of to another location where some other option might have been available.

It just occurred to me that if the code generated for a rule/predicate catches FailedCut and translates it into a normal FailedParse exception, then the cuts will have the right scope.

In reference to @false's question, here's a complete example of what I want to work:

start = expre ;

expre = named | term ;

named = word ':' ! term;

term = word ;

In that grammar, word can be reached through named or term, but I would like the parser to commit to the named branch after it has seen the :.

The Solution

To be fair, I've published my work so far at https://bitbucket.org/apalala/grako/.

In the final solution, sequences are enclosed with this context manager:

@contextmanager
def _sequence(self):
    self._push_cut()
    try:
        yield
    except FailedParse as e:
        if self._cut():
            self.error(e, FailedCut)
        else:
            raise
    finally:
        self._pop_cut()

And options in a choice function are enclosed with this:

@contextmanager
def _option(self):
    p = self._pos
    try:
        self._push_ast()
        try:
            yield
            ast = self.ast
        finally:
            self._pop_ast()
        self.ast.update(ast)
    except FailedCut as e:
        self._goto(p)
        raise e.nested
    except FailedParse:
        self._goto(p)

Which forces an exit out of the choice instead of a return to try the next option.

The cuts themselves are implemented thus:

def _cut(self):
    self._cut_stack[-1] = True

The full source code may be found on Bitbucket.


Solution

  • The solution proposed at the end of my question worked:

    cut_seen = False
    try:
        self.token('(')
        cut_seen = True 
        self.call('list')
        self.token(')')
    except FailedParse as e:
        if cut_seen:
             raise FailedCut(e)
        raise
    

    Then, any time a choice or optional is evaluated, the code looks like this:

    p = self.pos
    try:
       # code for the expression
    except FailedCut:
        raise
    except FailedParse:
        self.goto(p)
    

    Edit

    The actual solution required keeping a "cut stack". The source code is int Bitbucket.