What is the flex REJECT equivalent in ply? For my code I want ply to detect token LETTER and also WORD for the same text, but only LETTER tokens are detected.
import ply.lex as lex
from ply.lex import TOKEN
tokens = (
'LETTER',
'WORD'
)
@TOKEN(r'[a-zA-Z]')
def t_LETTER(t):
print('L')
return t
@TOKEN(rf'{t_LETTER}*')
def t_WORD(t):
print('W')
return t
# Error handling rule
def t_error(t):
print("Illegal character '%s'" % t.value[0])
t.lexer.skip(1)
# Build the lexer
lexer = lex.lex()
# Test it out
# Give the lexer some input
while True:
lexer.input(input())
# Tokenize
while True:
tok = lexer.token()
if not tok:
break # No more input
print(tok)
When I execute the code for the input av the output is: L LexToken(LETTER,'a',1,0) L LexToken(LETTER,'v',1,1) But I want the token WORD to be also detected. In flex I have REJECT for this but in ply I coudn't find an alternative yet.
There is no equivalent to beREJECT
in Ply. But that's not why your program doesn't recognize WORD
tokens; those aren't recognised because when Python expands f'{t_LETTER}*'
, it does not produce '[a-zA-Z]*'
since the value of t_LETTER
is a function, not a string.
Using REJECT
in the WORD
action in (f)lex might not be what you're looking for either, but in any case REJECT
is an extremely inefficient operation and is not recommended for modern code. Flex would tokenise abcd
as
WORD abc
WORD ab
WORD a
LETTER a
WORD bcd
WORD bc
WORD b
LETTER b
WORD cd
WORD c
LETTER c
WORD d
LETTER d
Maybe that's what you expect, but it seems a bit odd to me. In both Ply and flex, you can achieve similar results by using a combination of pushing characters back into the input stream (using yyless
or unput
in flex, or modifying lex.lexpos
in Ply), and changing the lexer state using start conditions.