Search code examples
pythoncompiler-constructionply

Identify declaration chain with PLY


I want to identify blocks of the type:

int a, b, c ...;

So, i made:

def p_list_variable(p):
    '''list_variable : type ID COMMA list_variable
                     | type ID'''
    if len(p) == 2:
        p[0] = [p[1]]
    else:
        p[0] = [p[1]] + p[3]


def p_type(p):
    '''type : TYPE_INT
            | TYPE_FLOAT
            | TYPE_CHAR
            | TYPE_DOUBLE'''
    p[0] = p[1]

But that doesn't work:

yacc: Syntax error at line 1, token=COMMA

How could I make the function work properly? Thanks in advance!


Solution

  • Your grammar says:

    list_variable : type ID COMMA list_variable
                  | type ID
    

    In other words, a list_variable is a type and an ID possibly followed by a comma and another list_variable.

    So perhaps it's int a. Or perhaps it's int a , int a. (A type, an ID, a COMMA and another list_variable). I doubt that's what you wanted.

    From your example, I guess you want a declaration to be a type followed by a list of variables and terminated with a semicolon.

    There are two ways to write simple lists of things: left-recursive and right-recursive. I prefer left-recursive, in which we say that a variable list might be a single variable ID, or it might be a variable list followed by a comma and a variable ID. But you could also use the right-recursive version, where you say that a variable list is an ID, possibly followed by a comma and another variable list. Either way, the grammar is a simple transcription of the description.

    Here's the left-recursive version:

    declaration: type variable_list ';'
    variable_list: ID
    variable_list: variable_list ',' ID
    

    When we convert that to PLY, we match productions with semantic actions. PLY recognises the productions to be applied, finds the appropriate semantic action function, and calls it. Although we're allowed to merge two or more productions in a single semantic action function, it makes little sense to do that if we are immediately going to ask which production was matched. If we have two different actions for the two different productions, there was no point combining the productions in the first place.

    So I would write that in PLY as follows:

    def p_declaration(p):
        ''' declaration : type variable_list ';' '''
        p[0] = ['DECLARE', p[1], p[2]]
    
    def p_one_variable(p):
        ''' variable_list : ID '''
        p[0] = [ p[1] ]
    
    def p_more_variables(p):
        ''' variable_list : variable_list ',' ID '''
        p[0] = p[1]
        p[0].append(p[3])    # ID is the third symbol on the right hand side
    
    # p_type as is in your example code
    

    You should play with that to see how it works.

    In order to be able to use the (imho) more readable literal character token syntax, you need to add the following to your lexer definition:

    literals = ',;'   # Add any other single-character tokens you want to use