This has maybe been asked before but I do not know what to search for, really. Suppose I have some string I'd like to build a parser with.
I have strings like a OR b
, b OR C
but also a OR (b AND c)
. Now the nested parentheses cause trouble for me and I don't know how to construct the appropriate p_*
functions. Is recursion allowed? If so, how?
import ply.lex as lex
import ply.yacc as yacc
# List of token names. This is always required
tokens = (
'VARIABLE',
'OR',
'AND',
'PAR_OPEN',
'PAR_CLOSE',
)
# Regular expression rules for simple tokens
t_ignore = ' \t'
t_VARIABLE = r'\b[a-z]+\b'
t_OR = r'\bOR\b'
t_AND = r'\bAND\b'
t_PAR_OPEN = r'\('
t_PAR_CLOSE = r'\)'
def t_error(t):
print("Illegal character '%s'" % t.value[0])
t.lexer.skip(1)
# Build the lexer
lexer = lex.lex()
def p_term(p):
'''term : VARIABLE OR VARIABLE
| VARIABLE AND VARIABLE
| PAR_OPEN VARIABLE AND VARIABLE PAR_CLOSE'''
if p[2] == 'AND':
p[0] = "".join([p[1], p[3]])
for idx, val in enumerate(p):
print(idx, val)
def p_error(p):
print("Syntax error in input!")
print(p)
parser = yacc.yacc()
res = parser.parse("(a AND b)")
print(res)
I'd also like to call it with e.g. res = parser.parse("a OR (b AND c)")
but to no avail.
Parentheses in an expression is quite common. I will first refer you to part 5 of the PLY documentation which provides an example of nested expression parsing. Yes, recursion is the answer.
There are several phrases that are used to mean the "smallest element of an expression." You might see "atom" or "term" (short for 'terminal') or "primary-expression".
When dealing with parenthetical sub-expressions, this is generally the approach to take. Write a grammar rule that unifies the various low-level things (for example, literal numbers and variable names) and add the sub-expression at that point.
In this example, from the PLY docs, expression
is the highest-level thing, and supports addition and subtraction. The next level down is term
which supports multiplication and division. The lowest-level thing is the factor
, which doesn't support any operations, but does unify NUMBER
and parenthetical sub-expressions. A factor could be 7
but it could also be (7 + 2 * 3)
.
expression : expression + term
| expression - term
| term
term : term * factor
| term / factor
| factor
factor : NUMBER
| ( expression )