Search code examples
pythongrammaryacclex

Grammar construction in ply - recursion allowed?


This has maybe been asked before but I do not know what to search for, really. Suppose I have some string I'd like to build a parser with.

I have strings like a OR b, b OR C but also a OR (b AND c). Now the nested parentheses cause trouble for me and I don't know how to construct the appropriate p_* functions. Is recursion allowed? If so, how?


This is what I have thus far:

import ply.lex as lex
import ply.yacc as yacc

# List of token names.   This is always required
tokens = (
    'VARIABLE',
    'OR',
    'AND',
    'PAR_OPEN',
    'PAR_CLOSE',
)

# Regular expression rules for simple tokens
t_ignore        = ' \t'
t_VARIABLE      = r'\b[a-z]+\b'
t_OR            = r'\bOR\b'
t_AND           = r'\bAND\b'
t_PAR_OPEN      = r'\('
t_PAR_CLOSE     = r'\)'

def t_error(t):
    print("Illegal character '%s'" % t.value[0])
    t.lexer.skip(1)

# Build the lexer
lexer = lex.lex()

def p_term(p):
    '''term : VARIABLE OR VARIABLE
            | VARIABLE AND VARIABLE
            | PAR_OPEN VARIABLE AND VARIABLE PAR_CLOSE'''

    if p[2] == 'AND':
        p[0] = "".join([p[1], p[3]])

    for idx, val in enumerate(p):
        print(idx, val)

def p_error(p):
    print("Syntax error in input!")
    print(p)


parser = yacc.yacc()
res = parser.parse("(a AND b)")
print(res)

I'd also like to call it with e.g. res = parser.parse("a OR (b AND c)") but to no avail.


P.S.: This is based on another one's question, really.


Solution

  • Parentheses in an expression is quite common. I will first refer you to part 5 of the PLY documentation which provides an example of nested expression parsing. Yes, recursion is the answer.

    There are several phrases that are used to mean the "smallest element of an expression." You might see "atom" or "term" (short for 'terminal') or "primary-expression".

    When dealing with parenthetical sub-expressions, this is generally the approach to take. Write a grammar rule that unifies the various low-level things (for example, literal numbers and variable names) and add the sub-expression at that point.

    In this example, from the PLY docs, expression is the highest-level thing, and supports addition and subtraction. The next level down is term which supports multiplication and division. The lowest-level thing is the factor, which doesn't support any operations, but does unify NUMBER and parenthetical sub-expressions. A factor could be 7 but it could also be (7 + 2 * 3).

     expression : expression + term
                | expression - term
                | term
    
     term       : term * factor
                | term / factor
                | factor
    
     factor     : NUMBER
                | ( expression )