Search code examples
pythonstring-matchingrecursive-descent

Recursive descent parser implementation in python


The objective of the python code(3.5) is to read standard C codes for the following rules:

Program  --> main () { declarations   statement-list } 
declarations--> data-type identifier-list; declarations |epsilon  
data-type --> int|char identifier-list --> id|id, identifier-list   
statement_list --> statement   statement_list| epsilon 
statement -->    looping-stat 
looping-stat --> while (expn) {assignment_stat} |
        for    (assignment_stat ; expn ; increment_stat )       { assignment_stat }    
expn--> factor eprime 
eprime-->relop factor|epsilon 
assignment_stat    --> id = factor 
increment_stat --> id  inc_dec 
inc_dec --> ++|-- 
factor --> id|num 
relop --> = =|!=|<=|>=|>|<

I understand that the method is to use consecutive procedure calls (For eg. for main() call declaration() and statements with it). Idea was to read lines from a text into a list and try to parse it. I am confused on rules like declaration. For example

int id;

and

while(a<100)

Any help will be appreciated.

A trial code:

#for input
def file_input():
    global buffer,lookahead

    buffer=list()
    file=open("parser.txt","r")
    lines=file.readlines()
    file.close()
    for line in lines:
        line=line.strip()
        #print(line)
        buffer.append(line)

    return(buffer)

#for the while loop
def while_loop():
    global lookahead,buffer

    if "while_loop" in buffer[lookahead]:
        print("Parsing",buffer[lookahead])
        match('while_loop')
        expression()
        assignment()

#matching   
def match(t):
    global lookahead,buffer

    print('matching', t)
    if buffer[lookahead] == "t":
        lookahead = lookahead + 1
    else:
        print('error')

Solution

  • Where are you confused? You're doing fine so far.

    You aren't coding individual statement types: you're using a general process to code grammar rules. Find each token in the order given on the RHS of the grammar rule. If the token is a terminal, use your match routine. If it's a non-terminal, call that non-terminal's function.

    When you have a choice of RHS expansion, such as with the loop statement, you need to use the lookahead to decide which routine to call. Your function looping_stat has to look at the next token and call either while_loop or for_loop.

    Got it?