I'm having a problem with creating a lexer using PLY in Python

I have tried to create a Lexer recently, and it doesn't work out well.

The problem is it's thrown an error message shows that "Can't build lexer". Here's the traceback:

ERROR: Rule 't_TIMES' defined for an unspecified token TIMES
ERROR: Rule 't_DIVIDE' defined for an unspecified token DIVIDE
Traceback (most recent call last):
  File "...\Lexer.py", line 24, in <module>
    lexer = lex.lex()
            ^^^^^^^^^
  File "...\lex.py", line 910, in lex
    raise SyntaxError("Can't build lexer")
SyntaxError: Can't build lexer

I'm aware that it's because of the t_error() function of mine. I also sense the token I've made maybe having a problem. Please help me with that, I know that this is kind of dumb but I'm new, so please be nice to me. Btw, here's the source code

import ply.lex as lex
import ply.yacc as yacc

import sys

tokens = [
    "INT",
    "ID",
    "PLUS",
    "MINUS",
    "EOF",
]

t_INT = r"\d+"
t_ID = r"[a-zA-Z_][a-zA-Z0-9_]*"
t_PLUS = r"+"
t_MINUS = r"-"
t_TIMES = r"*"
t_DIVIDE = r"/"

def t_error(t):
    print("Illegal character '%s'" % t.lexer.lexeme, file=sys.stderr)

lexer = lex.lex()

def p_expression(p):
    """expression : INT
                 | ID
                 | expression PLUS expression
                 | expression MINUS expression
                 | expression TIMES expression
                 | expression DIVIDE expression"""
    if len(p) == 2:
        if isinstance(p[1], int):
            p[0] = p[1]
        elif isinstance(p[1], str):
            p[0] = p[1]
    else:
        if p[2] == "+":
            p[0] = p[1] + p[3]
        elif p[2] == "-":
            p[0] = p[1] - p[3]
        elif p[2] == "*":
            p[0] = p[1] * p[3]
        elif p[2] == "/":
            p[0] = p[1] / p[3]

parser = yacc.yacc()

def test(text):
    try:
        result = parser.parse(text)
        if result:
            print(result)
        else:
            print("Empty expression")
    except yacc.YaccError:
        print("Error parsing input")

if __name__ == "__main__":
    test("123")
    test("hello")
    test("123 + 456")
    test("123 - 456")
    test("123 * 456")
    test("123 / 456")

Maybe I'm just stupid, but because of that so I cannot make it to run.

Solution

These errors...

ERROR: Rule 't_TIMES' defined for an unspecified token TIMES
ERROR: Rule 't_DIVIDE' defined for an unspecified token DIVIDE

...seem pretty clear. You haven't defined the tokens named TIMES or DIVIDE in your tokens array. You need:

tokens = [
    "INT",
    "ID",
    "PLUS",
    "MINUS",
    "EOF",
    "TIMES",
    "DIVIDE",
]

Once you fix those errors, you will get:

ERROR: Invalid regular expression for rule 't_PLUS'. nothing to repeat at position 11
ERROR: Invalid regular expression for rule 't_TIMES'. nothing to repeat at position 12

That's because the characters + and * are both regex wildcards, so you need to escape them if you want the literal character:

t_PLUS = r"\+"
t_TIMES = r"\*"

Once you fix those errors, you'll ultimately get this from your t_error method:

AttributeError: 'Lexer' object has no attribute 'lexeme'. Did you mean: 'lexre'?

There doesn't appear to be a lexeme attribute, but you can use t.value:

def t_error(t):
    print("Illegal character '%s'" % t.value, file=sys.stderr)

Having fixed that error, you will now get:

123
hello
Illegal character ' + 456'
.
.
.
ply.lex.LexError: Scanning error. Illegal character ' '

You have spaces in your expressions, but you haven't accounted for this in your rules. The quick fix is to remove the spaces in your test expressions:

if __name__ == "__main__":
    test("123")
    test("hello")
    test("123+456")
    test("123-456")
    test("123*456")
    test("123/456")

Having fixed that error, you'll now get:

123
hello
123456
.
.
.
TypeError: unsupported operand type(s) for -: 'str' and 'str'

And that's because you're trying to add string values in your p_expression method. You need to convert them to numbers before applying arithmetic operators. The easiest solution is to replace your definition of t_INT with this method:

def t_INT(t):
    r'\d+'
    t.value = int(t.value)
    return t

And now running the code produces:

123
hello
579
-333
56088
0.26973684210526316