Search code examples
pythonlexerply

special-case lexer rule in ply


Is there a way to special-case a ply lexer rule?

t_IDENT     = r'[a-zA-Z_][0-9a-zA-Z_]*'
t_OPERATOR  = r'[<>=/*+-]+'
t_DEFINE    = r'='
t_PRODUCES  = r'=>'

I want to define an operator as any combination of the listed characters, except that = and => have their own special cases. For example:

a + b
# IDENT('a') OPERATOR('+') IDENT('b') 

a ++=--> b
# IDENT('a') OPERATOR('++=-->') IDENT('b') 

a == b
# IDENT('a') OPERATOR('==-->') IDENT('b') 

a => b
# IDENT('a') PRODUCES('=>') IDENT('b') 

a = b
# IDENT('a') DEFINE('=') IDENT('b') 

a >= b
# IDENT('a') OPERATOR('>=') IDENT('b') 

a <=> b
# IDENT('a') OPERATOR('<=>') IDENT('b') 

Solution

  • I removed the automated t_DEFINE and t_PRODUCES rules and used the reserved word technique to handle the special cases:

    special_operators = {'=': 'DEFINE',
                         '=>': 'PRODUCES'}
    
    def t_OPERATOR(t):
        r'[<>=/*+-]+'
        t.type = special_operators.get(t.value, t.type)
        return t