Search code examples
pythonvariablesvarinterpreter

variable stores EQUALS instead of string | python based interpreter


I've been making an interpreter in python and I've run into a problem, it identifies strings, variables, numbers and expressions. while testing the .lang file I noticed it outputted {'$var': 'EQUALS'} instead of the variable's string or num. it outputs the "print", "num", "expr" values perfectly but not variables, at first I tried re-evaluating the code and changed symbols[varname[4:]] = varvalue to symbols[varname[4:]] = varvalue[6:] which made me get {'$var': ' '}.

it recognizes "variable" as the string associated with $var but outputs {'$var': 'EQUALS'} as in the equal before the string, num or expr. I want it to store "variable" as the string of the variable, as in {'$var': 'STRING: "variable"'}.

I seem to be not realizing something but I think the problem may be in my parse or doASSIGN function, could someone please tell me or give me a hint at what I might be doing wrong?

OUTPUT:
PS C:\Users\<user>\Desktop\spl> python basic.py test.lang
hello world
55
48
{'$var': 'EQUALS'}
test.lang:
print "hello world"
print 55
print (10 + 2) * 4
$var = "variable"
from sys import *

tokens = []
num_stack = []
symbols = {}

def open_file(filename):
    data = open(filename, 'r').read()
    data += "<EOF>"
    return data

def lex(filecontents):
    tok = ""
    state = 0
    varstarted = 0
    var = ""
    string = ""
    expr = ""
    n = ""
    isexpr = 0
    for char in filecontents:
        tok += char
        if tok == " ":
            if state == 0:
                tok = ""
            else:
                tok = " "
        elif tok == "\n" or tok =="<EOF>":
            if expr != "" and isexpr == 1:
                tokens.append("EXPR:" + expr)
                expr = ""
            elif expr != "" and isexpr == 0:
                tokens.append("NUM:" + expr)
                expr = ""
            elif var != "":
                tokens.append("VAR:" + var)
                var = ""
                varstarted = 0
            tok = ""
        elif tok == "=" and state == 0:
            if var != "":
                tokens.append("VAR:" + var)
                var = ""
                varstarted = 0
            tokens.append("EQUALS")
            tok = ""
        elif tok == "$" and state == 0:
            varstarted = 1
            var += tok
            tok = ""
        elif varstarted == 1:
            var += tok
            tok = ""
        elif tok == "PRINT" or tok == "print":
            tokens.append("PRINT")
            tok = ""
        elif tok == "0" or tok == "1" or tok == "2" or tok == "3" or tok == "4" or tok == "5" or tok == "6" or tok == "7" or tok == "8" or tok == "9":
            expr += tok
            tok = ""
        elif tok == "+" or tok == "-" or tok == "*" or tok == "/" or tok == "(" or tok == ")":
            isexpr = 1
            expr += tok
            tok = ""
        elif tok == "\"":
            if state == 0:
                state = 1
            elif state == 1:
                tokens.append("STRING:" + string + "\"")
                string = ""
                state = 0
                tok = ""
        elif state == 1:
            string += tok
            tok = ""
    #print(tokens)
    #return ''
    return tokens

def evalExpression(expr):
    return eval(expr)

def doPRINT(toPRINT):
    if(toPRINT[0:6] == "STRING"):
        toPRINT = toPRINT[8:]
        toPRINT = toPRINT[:-1]
    elif(toPRINT[0:3] == "NUM"):
        toPRINT = toPRINT[4:]
    elif(toPRINT[0:4] == "EXPR"):
        toPRINT = evalExpression(toPRINT[5:])
    print(toPRINT)

def doASSIGN(varname, varvalue):
    symbols[varname[4:]] = varvalue

def parse(toks):
    i = 0
    while(i < len(toks) - 1):
        if toks[i] + " " + toks[i+1][0:6] == "PRINT STRING" or toks[i] + " " + toks[i+1][0:3] == "PRINT NUM" or toks[i] + " " + toks[i+1][0:4] == "PRINT EXPR":
            if toks[i+1][0:6] == "STRING":
                doPRINT(toks[i+1])
            elif toks[i+1][0:3] == "NUM":
                doPRINT(toks[i+1])
            elif toks[i+1][0:4] == "EXPR":
                doPRINT(toks[i+1])
            i+= 2
        if toks[i][0:3] + " " + toks[i+1] + " " + toks[i+2][0:6] == "VAR EQUALS STRING" or toks[i][0:3] + " " + toks[i+1] + " " + toks[i+2][0:3] == "VAR EQUALS NUM" or toks[i][0:3] + " " + toks[i+1] + " " + toks[i+2][0:4] == "VAR EQUALS EXPR":
            if toks[i+2][0:6] == "STRING":
                doASSIGN(toks[i],toks[i+1])
            elif toks[i+2][0:3] == "NUM":
                doASSIGN(toks[i],toks[i+1])
            elif toks[i+2][0:4] == "EXPR":
                doASSIGN(evalExpression(toks[i+2][5:]))
            i += 3
    print(symbols)

def run():
    data = open_file(argv[1])
    toks = lex(data)
    parse(toks)

run()

Solution

  • We can actually fix the problem by only changing one character in your parser! As someone has already pointed out, using the following snippet, you will append EQUALS every time a = is found somewhere in the source program:

    ...
    elif tok == "=" and state == 0:
        if var != "":
            tokens.append("VAR:" + var)
            var = ""
            varstarted = 0
        tokens.append("EQUALS")
        tok = ""
    ...
    

    Explanation

    This is because your program iterates through your source file (test.lang) and tries to find any symbol matching the corresponding token in meaning. As soon as your lexer finds a =, EQUALS is appended to the tokens list, no matter the context. Implementing context to the EQUALS token is a challenging yet not totally impossible task.

    So, while there is a way to implement a fix for this by making some changes to the lexer, it is not necessary if the language itself does not become any more complicated than right now. By this I mean that the structure for declaring variables doesn't get more complex than $var = "string". Once it exceeds this in complexity, changes to the lexer would be adequate (by complex, I mean things like type declaration: $var str = "string").

    Solution

    Anyway, let's take a look at a simple fix:

    if toks[i][0:3] + " " + toks[i+1] + " " + toks[i+2][0:6] == "VAR EQUALS STRING" or ...:
        if toks[i+2][0:6] == "STRING":
            doASSIGN(toks[i],toks[i+2]) #: line I changed
    

    Instead of using doASSIGN(toks[I], toks[i+1] I used doASSIGN(toks[i],toks[i+2]) as toks[i+1] would land directly on EQUALS instead of STRING, therefore resulting in {'$var', 'EQUALS'}. After playing around with the program for a while, I found no reason not to implement this, as including the EQUALS would be merely a formality which you could technically include if you decide to implement program optimization.

    Now, is this the most beautiful and best-practice fix? Probably not, but it is frankly the easiest and fastest one. You can always create a fallback variable by adding a third, optional parameter to your doASSIGN function if you need to record the EQUALS presence somewhere.

    However, it certainly is the fix that achieves the same as all other fixes without much effort and hardly any rewriting necessary. Alternatively, you could also try to implement look-ahead techniques using itertools. This would also be a very popular method of lexing a source file.

    All other solutions would require your program to be rewritten for the most part, which is something I want to avoid doing, as it should remain your own work. Nonetheless, I would suggest you look a little bit deeper into Interpreter design and check out some best practices! Furthermore, you might want to check out popular libraries for building Interpreters and Compilers! :)