I have tried to create a Lexer recently, and it doesn't work out well.
The problem is it's thrown an error message shows that "Can't build lexer". Here's the traceback:
ERROR: Rule 't_TIMES' defined for an unspecified token TIMES
ERROR: Rule 't_DIVIDE' defined for an unspecified token DIVIDE
Traceback (most recent call last):
File "...\Lexer.py", line 24, in <module>
lexer = lex.lex()
^^^^^^^^^
File "...\lex.py", line 910, in lex
raise SyntaxError("Can't build lexer")
SyntaxError: Can't build lexer
I'm aware that it's because of the t_error()
function of mine. I also sense the token I've made maybe having a problem. Please help me with that, I know that this is kind of dumb but I'm new, so please be nice to me.
Btw, here's the source code
import ply.lex as lex
import ply.yacc as yacc
import sys
tokens = [
"INT",
"ID",
"PLUS",
"MINUS",
"EOF",
]
t_INT = r"\d+"
t_ID = r"[a-zA-Z_][a-zA-Z0-9_]*"
t_PLUS = r"+"
t_MINUS = r"-"
t_TIMES = r"*"
t_DIVIDE = r"/"
def t_error(t):
print("Illegal character '%s'" % t.lexer.lexeme, file=sys.stderr)
lexer = lex.lex()
def p_expression(p):
"""expression : INT
| ID
| expression PLUS expression
| expression MINUS expression
| expression TIMES expression
| expression DIVIDE expression"""
if len(p) == 2:
if isinstance(p[1], int):
p[0] = p[1]
elif isinstance(p[1], str):
p[0] = p[1]
else:
if p[2] == "+":
p[0] = p[1] + p[3]
elif p[2] == "-":
p[0] = p[1] - p[3]
elif p[2] == "*":
p[0] = p[1] * p[3]
elif p[2] == "/":
p[0] = p[1] / p[3]
parser = yacc.yacc()
def test(text):
try:
result = parser.parse(text)
if result:
print(result)
else:
print("Empty expression")
except yacc.YaccError:
print("Error parsing input")
if __name__ == "__main__":
test("123")
test("hello")
test("123 + 456")
test("123 - 456")
test("123 * 456")
test("123 / 456")
Maybe I'm just stupid, but because of that so I cannot make it to run.
These errors...
ERROR: Rule 't_TIMES' defined for an unspecified token TIMES
ERROR: Rule 't_DIVIDE' defined for an unspecified token DIVIDE
...seem pretty clear. You haven't defined the tokens named TIMES
or DIVIDE
in your tokens
array. You need:
tokens = [
"INT",
"ID",
"PLUS",
"MINUS",
"EOF",
"TIMES",
"DIVIDE",
]
Once you fix those errors, you will get:
ERROR: Invalid regular expression for rule 't_PLUS'. nothing to repeat at position 11
ERROR: Invalid regular expression for rule 't_TIMES'. nothing to repeat at position 12
That's because the characters +
and *
are both regex wildcards, so you need to escape them if you want the literal character:
t_PLUS = r"\+"
t_TIMES = r"\*"
Once you fix those errors, you'll ultimately get this from your t_error
method:
AttributeError: 'Lexer' object has no attribute 'lexeme'. Did you mean: 'lexre'?
There doesn't appear to be a lexeme
attribute, but you can use t.value
:
def t_error(t):
print("Illegal character '%s'" % t.value, file=sys.stderr)
Having fixed that error, you will now get:
123
hello
Illegal character ' + 456'
.
.
.
ply.lex.LexError: Scanning error. Illegal character ' '
You have spaces in your expressions, but you haven't accounted for this in your rules. The quick fix is to remove the spaces in your test expressions:
if __name__ == "__main__":
test("123")
test("hello")
test("123+456")
test("123-456")
test("123*456")
test("123/456")
Having fixed that error, you'll now get:
123
hello
123456
.
.
.
TypeError: unsupported operand type(s) for -: 'str' and 'str'
And that's because you're trying to add string values in your p_expression
method. You need to convert them to numbers before applying arithmetic operators. The easiest solution is to replace your definition of t_INT
with this method:
def t_INT(t):
r'\d+'
t.value = int(t.value)
return t
And now running the code produces:
123
hello
579
-333
56088
0.26973684210526316