Search code examples
pythonpython-3.xyacclexply

Python PLY Yacc "syntax error"


Okay, so I'm trying to build a parser of my mini-language (obviously), and setting variables seems to be properly working. But as soon as Yacc comes across a function definition, it just gives me a syntax error, and a couple of EOF errors (which I know are from when Yacc has no remaining rules to set) and nothing else happens... Where did I go wrong?

Here's an example of the syntax I'm parsing:

$name = "John Doe"
$age = 72
$waterInOceans = 95.4

!testFunction {

}

Where the !testFunction { } section is defining a function (based off of the exclamation point). I don't know if that's going to be useful in debugging.

# The Lexer

import ply.lex as lex

tokens = ["MINUS", "SEPARATOR", "MODIFIER", "FUNCTION_NAME", "UNDEF_BLOCK", "VARIABLE_NAME", "EQUALS", "STRING", "FLOAT", "INNER_CONTENT", "ARGUMENTS", "INTEGER", "PLUS"]

def t_ARGUMENTS(t): # Finds arguments in calls and function definitions
    r'\(.*\)'
    t.value = t.value[1:-1] # strip parenthesis
    t.value = t.value.split(" && ")
    return t

def t_STRING(t): # finds strings
    r'"\w.+"'
    t.value = t.value[1:-1] # strips the quotation marks of the string
    return t

def t_FLOAT(t): # finds floats
    r'\d+.\d+'
    t.value = float(t.value)
    return t

def t_INTEGER(t):
    r'\d+'
    t.value = int(t.value)
    return t

def t_VARIABLE_NAME(t):
    r'\$\w*\b'
    t.value = t.value[1:]
    return t

def t_INNER_CONTENT(t):
    r'\{\n.*\n\}|\{.*\}'
    t.value = t.value[1:-1]
    return t

def t_FUNCTION_NAME(t):
    r'!\w+'
    t.value = t.value[1:]
    return t

t_ignore = r"\n|\t|\r"
t_EQUALS = r"\="
t_PLUS = r"\+"
t_MINUS = r"-"
t_MODIFIER = r"\."
t_SEPARATOR = r"\,"

t_UNDEF_BLOCK = r"\w+" # Any block of text that is left over and isn't assigned by the end (used by functions)

def t_error(t):
    t.lexer.skip(1)

lex.lex()

#opened = open("example.zeq", "r")
#content = opened.read()
#opened.close()

#lex.input(content)

And then the Yacc half:

# The Yacc parser

import ply.yacc as yacc
import compiler # Get the compiler (tokenizer; compiler.py) which generates tokens
import sys
from os import system


##############
### IGNORE ###
tokens = compiler.tokens
#system("clear")
print("Executing "+sys.argv[1]+" |\n"+("-"*(len(sys.argv[1])+12)))
### IGNORE ###
##############


VARIABLES = {}
FUNCTIONS = {}

def p_assign(p): # Set new variable
    '''assignment : VARIABLE_NAME EQUALS compound
                  | VARIABLE_NAME EQUALS STRING
                  | VARIABLE_NAME EQUALS INTEGER
                  | VARIABLE_NAME EQUALS FLOAT'''

    #print("Setting '{}' to '{}'...".format(str(p[1]), str(p[3])))
    VARIABLES[p[1]] = p[3]

def p_number(p): # Combines floats and integers into a blanket non-terminal for simplicity sakes
    '''number : FLOAT
              | INTEGER'''
    p[0] = p[1]

def p_compound(p): # Complete the value *before* the variable is assigned!
    '''compound : number PLUS number
                | number MINUS number'''

    type1 = type(p[1])
    type2 = type(p[3])
    operator = p[2]
    if operator == "+":
        p[0] = p[1] + p[3]
    elif operator == "-":
        p[0] = p[1] - p[3]

def p_undefined(p):
    '''undefined : UNDEF_BLOCK'''
    print("Undefined block")

def p_function(p):
    '''function : FUNCTION_NAME INNER_CONTENT'''

    print("Creating a function")

    name = p[1]
    content = p[2]

    FUNCTIONS[name] = content

def p_empty(p):
    '''empty : '''

#~ def p_error(p):
    #~ if p:
        #~ print("Syntax error: "+p.type)
    #~ else:
        #~ pass

parser = yacc.yacc()

opened = open(sys.argv[1], "r")
content = opened.read()
opened.close()

for line in content.splitlines():
    parser.parse(line)

print(VARIABLES)
print(FUNCTIONS)

I'm waiting for it to be a simple overlooked detail...


Solution

  • When you ask Ply (or yacc, for that matter) to parse an input, it attempts to recognize a single instance of the top-level non-terminal (or "starting symbol"). This will usually a grammatical description of the entire input, so it will often have a name like program, although there are use cases in which it is useful to parse just a part of the input.

    Ply (and yacc) assume that the first grammar production is for the starting symbol. In your case, the first production is assignment, and so that is what it will try to parse (and nothing else). assignment cannot derive a function definition or any other statement type, so those cause syntax errors.

    If you want to explicitly tell Ply what the top-level symbol is, you can do so. See the manual section on starting symbols.