How to interpret a "do" loop from scheme in python PLY

I'm doing an interpreter for Scheme using PLY (Python Lex-Yacc) and I can't implement a "do" loop using values from a stack that keeps track of the IDs that correspond to a variable (such as the i for the loop).

This is for a final project of a compilers design course, the main issue is the moment of adding a value to the stack, I'm using a dictionary to have the variable name as the key and then the value, but it is not being assigned in the moment it should, instead, it tries to do a comparison between a value and the variable but it fails because the stack is still empty.

This is the most important part of the code:

ids = { }

def p_program(p):
    'program : form'
    #return p
    print(p[1])

def p_form_a(p):
    '''
    form : definition
         | expression
    '''
    p[0] = p[1]

def p_expression(p):
    '''
    expression : constant
               | do_expression
               | ID
               | display
    '''
    p[0] = p[1]

def p_do_expression_a(p):
    #       0           1      2    3      4     5         6      7    8      9         10        11              12              13        14
    'do_expression : OPEN_PAR DO OPEN_PAR ID constant OPEN_PAR symbol ID expression CLOSE_PAR CLOSE_PAR comparison_expression expression CLOSE_PAR'
    ids[p[4]] = p[5]

    aux = p[12]
    while True:
        expr = p[13]

        if ((type(p[5]) == int   and type(p[9]) == int)
        or  (type(p[5]) == int   and type(p[9]) == float)
        or  (type(p[5]) == float and type(p[9]) == int)
        or  (type(p[5]) == float and type(p[9]) == float)):
            if   p[7] == '+': ids[p[4]] = ids[p[4]] + p[9]
            elif p[7] == '-': ids[p[4]] = ids[p[4]] - p[9]
            elif p[7] == '*': ids[p[4]] = ids[p[4]] * p[9]
            elif p[7] == '/': ids[p[4]] = ids[p[4]] / p[9]
            elif p[7] == 'remainder': ids[p[4]] = ids[p[4]] % p[9]
        else:
            print("Error: Type mismatch.")
            sys.exit()

        aux = p[12]
        if aux == '#t':
            break

    p[0] = expr

def p_comparison_expression(p):
    'comparison_expression : OPEN_PAR comparison expression expression CLOSE_PAR'

    if type(p[3]) == str:
        p[3] = ids[p[3]]

    if type(p[4]) == str:
        p[4] = ids[p[4]]

    if p[2] == 'eq?':
        if p[3] == p[4]: p[0] = '#t'
        else: p[0] = '#f'
    elif p[2] != 'neq?':
        if p[3] != p[4]: p[0] = '#t'
        else: p[0] = '#f'
    elif p[2] != '=':
        if p[3] == p[4]: p[0] = '#t'
        else: p[0] = '#f'
    elif p[2] != '>':
        if p[3] > p[4]: p[0] = '#t'
        else: p[0] = '#f'
    elif p[2] != '<':
        if p[3] < p[4]: p[0] = '#t'
        else: p[0] = '#f'
    elif p[2] != '>=':
        if p[3] >= p[4]: p[0] = '#t'
        else: p[0] = '#f'
    elif p[2] != '<=':
        if p[3] <= p[4]: p[0] = '#t'
        else: p[0] = '#f'
    else:
        print("Error: Comparison problem.")
        sys.exit()

def p_display(p):
    'display : OPEN_PAR DISPLAY expression CLOSE_PAR'
    if type(p[3]) == str:
        p[3] = ids[p[3]]

    print(p[3])

def p_symbol(p):
    '''
    symbol : ADD
           | MINUS
           | DIVIDE
           | MULTIPLY
           | REMAINDER
    '''
    p[0] = p[1]

def p_boolean(p):
    '''
    boolean : TRUE
            | FALSE
    '''
    p[0] = p[1]

def p_comparison(p):
    '''
    comparison : EQQUES
               | NEQQUES
               | EQUALS
               | GREATER
               | LESS
               | GREATER_EQUAL
               | LESS_EQUAL
    '''
    p[0] = p[1]

def p_constant(p):
    '''
    constant : INT
             | FLOAT
             | CHARACTER
             | STRING
             | boolean
    '''
    p[0] = p[1]

I'm testing the following piece of Scheme code:

(do (i 0 (+ 1 i)) (< i 5) (display i))

which should display: 0 1 2 3 4

but instead I get:

Traceback (most recent call last):
    File "C:\Compiladores\Lexer\scheme_compiler.py", line 510, in <module>
        parser.parse(s)
    File "C:\Compiladores\Lexer\ply\yacc.py", line 333, in parse
        return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
    File "C:\Compiladores\Lexer\ply\yacc.py", line 1120, in parseopt_notrack
        p.callable(pslice)
    File "C:\Compiladores\Lexer\scheme_compiler.py", line 338, in p_comparison_expression
        p[3] = ids[p[3]]
KeyError: 'i'

Can someone please help me achieve it? I'd appreciate it a lot

Solution

Your do form grammar is (split into lines and using single character literals for readability):

do_expression : '(' DO '(' ID constant '(' symbol ID expression ')' ')'
                       comparison_expression expression ')'

(Note: This is not actually the correct grammar for a number of reasons, one of which is noted in a different answer. But that's not relevant to this question.)

In your semantic action, p[12] and p[13] correspond to comparison_expression and expression. Stripped to its essentials, your semantic action does the following:

# create a new bound variable with the indicated initial value (`p[5]`).
aux = p[12]
while True:
    expr = p[13]    # You have a typo; I assume you meant p[13], not [13]
    # Update the index variable's value
    aux = p[12]     # This value is still the same as it was at entry
    if aux == '#t':
        break

p[0] = expr

Now, it's important to reflect on what p[12] and p[13] are. Ply does not do any magic under the Python hood; it is simply generating Python code. So p[12] and p[13] are ordinary Python values, which are the result of executing the semantic actions for the comparison_expression and expression non-terminals. And those semantic actions are evaluated before the do_expression has been reduced, so their values are computed without any reference to the do_expression. Both comparison_expression and expression refer to the bound variable i (as is natural for an iteration construct), but that variable is not bound when those semantic actions are evaluated. Hence the error message.

But the logic is more fundamentally flawed than that. In your model, the comparison and action expressions are evaluated exactly once, when they are parsed. But that's not the semantics of the loop construct; in the loop semantics the comparison expression is evaluated repeatedly until it indicates the the loop is done, and the action expression might not be evaluated at all if the comparison fails on the first bound value.

You seem to be assuming that accessing p[12] and p[13] will somehow reevaluate the associated semantic actions. But Python doesn't have such a facility and Ply doesn't magically implement one either. That's your responsibility, based on the intended semantics of the language you're trying to compile (or, in this case, interpret).

To accomplish that, you need to transform the parsed input into some kind of data structure which you can later evaluate (or not, as the case may be). So you have to arrange for the value returned by the semantic actions to be a description of the parsed code, not an immediate evaluation (which would not be meaningful anyway in the absence of variable bindings).

In the case of Scheme, parsing is really the least of the problems. Although special forms do slightly complicate the task, Scheme programs are basically S-expressions, which can be almost trivially converted into lists without any requirement for a sophisticated parsing technology. That was the original intention of the Scheme (or, rather, Lisp) syntax. Once you have the list structure, or some functional equivalent (an Abstract Syntax Tree or even three-address code), you can evaluate the program text as necessary when necessary and with the correct variable bindings.

Once upon a time, no-one would have thought of assigning a task like this without reference to Abelson & Sussman's excellent (and still relevant) textbook Structure and Interpretation of Computer Programs, affectionately referred to as SICP. Thanks to the generosity of the authors and the publisher, the full text of that book is freely available online. I cannot recommend it enough.

P.S.: Also, note that the binding of the loop control variable (i in this case) is only present during the evaluation of the loop. Once the loop terminates, the variable should be removed. But you can't model that with a simple dictionary, because there might be an outer variable with the same name. That's all explained in SICP, too.