Search code examples
pythoncontext-free-grammarply

Python PLY get next token during parsing


I'm trying to get next token and make some operation depending on it. I know this is odd, but still is it possible to to something like this?:

def p_func(p):
    '''expr : MY_TOKEN'''
    if next_token is None:
        #do something here
    p[0] = p[1]

I've tried to do the following:

def p_func(p):
    '''expr : MY_TOKEN'''
    if parser.token() is None:
        #do something here
    p[0] = p[1]

it works to obtain token but it after this function next token skipped because I took it. Is it possible to return it back or get just copy of next token?


Solution

  • I don't believe there is a reliable way to do this in Ply.

    Ply usually reads the next token (the "lookahead token") before performing reductions, so calling parser.token() will usually return the second next token. But Ply does not guarantee to read the next token before the reduction: in some cases where it can deduce the action without a lookahead, it will perform the reduction immediately. So the token produced by parser.token() might be the next token, as it apparently is in the particular rule in which you tried it.

    If you require consistency, you can instruct Ply to always read the lookahead token before doing reductions. Obviously, there is no way to tell it to never read the lookahead token, since sometimes it is needed to decide upon a parser action.

    This would be fine if the lookahead token were available to parser actions, as it is in parsers generated by bison (for example). Unfortunately, in Ply the lookahead token is kept as a local variable in the parser, which is more efficient but less accessible.

    You could modify the source code of Ply to make the current lookahead token a member of the parser object instead of a local variable. (It's called lookahead if you want to pursue that idea [Note 1].) That would introduce a very small slowdown in the parser but I doubt whether it would be visible in practice. However, that will make it more complicated to share your code; you would have to distribute the entire modified Ply package, presumably renamed, as part of your application.

    The most common use of the lookahead token is crafting better error messages, or otherwise assisting with error recovery. Using it to change the behaviour of a reduction action strikes me as suboptimal, since most such cases can be achieved by using a better grammar. But presumably you've explored alternatives so I'll leave it at that.


    Notes

    1. Be careful when modifying yacc.py: there are several versions of the Parser object, based on different possible optimisations. The standard install script autogenerates this file from a skeleton in order to keep the various optimised implementations in sync with each other. To make this modification, you'll either have to use Ply's build mechanism or carefully make the change in all of the different versions (and there are comments suggesting you don't do that.)