Search code examples
pythonflex-lexerlexply

REJECT equivalent in ply


What is the flex REJECT equivalent in ply? For my code I want ply to detect token LETTER and also WORD for the same text, but only LETTER tokens are detected.

import ply.lex as lex
from ply.lex import TOKEN


tokens = (
    'LETTER',
    'WORD'
)


@TOKEN(r'[a-zA-Z]')
def t_LETTER(t):
    print('L')
    return t


@TOKEN(rf'{t_LETTER}*')
def t_WORD(t):
    print('W')
    return t


# Error handling rule

def t_error(t):
    print("Illegal character '%s'" % t.value[0])
    t.lexer.skip(1)

 # Build the lexer
lexer = lex.lex()

# Test it out

# Give the lexer some input
while True:
    lexer.input(input())

    # Tokenize
    while True:
        tok = lexer.token()
        if not tok:
            break      # No more input
        print(tok)

When I execute the code for the input av the output is: L LexToken(LETTER,'a',1,0) L LexToken(LETTER,'v',1,1) But I want the token WORD to be also detected. In flex I have REJECT for this but in ply I coudn't find an alternative yet.


Solution

  • There is no equivalent to beREJECT in Ply. But that's not why your program doesn't recognize WORD tokens; those aren't recognised because when Python expands f'{t_LETTER}*', it does not produce '[a-zA-Z]*' since the value of t_LETTER is a function, not a string.

    Using REJECT in the WORD action in (f)lex might not be what you're looking for either, but in any case REJECT is an extremely inefficient operation and is not recommended for modern code. Flex would tokenise abcd as

    WORD abc
    WORD ab
    WORD a
    LETTER a
    WORD bcd
    WORD bc
    WORD b
    LETTER b
    WORD cd
    WORD c
    LETTER c
    WORD d
    LETTER d
    

    Maybe that's what you expect, but it seems a bit odd to me. In both Ply and flex, you can achieve similar results by using a combination of pushing characters back into the input stream (using yyless or unput in flex, or modifying lex.lexpos in Ply), and changing the lexer state using start conditions.