python-3.x compiler-construction lexical-analysis

Construction of Lexical Analyzer, String Index Out Of Range in While Loop,

The Intro: I'm Learning how to make my own language and steps the necessary to achieve that. I was trying to implement Lexical Analyzer but I'm getting an error even though my logic is right. I want the Program to not read the comment.

The Problem: I'm getting the Error "String Index out of range', when I try to Iterate the words and look for '\n' in the line of the comment.

Python Code:

comment = ['//', '/*', '*/']
keyw = ["main", "void"]
br = ['(', ')', '{', '}']
lineCount = 1
temp = ''
flag = False
f = open('Program.C', 'r')
Program = f.read()
#print(Program)

for c in range(len(Program)):
    if Program[c] == ' ':
        continue
    if Program[c] == '\n':
        lineCount = lineCount + 1
        continue
    if Program[c] == '/':
        c = c + 1
        if Program[c] == '/':
            c = c + 1
            while Program[c] != '\n':
                c = c +1
    if Program[c] in br:
        print(lineCount, "Brackets", Program[c])
    else:
        temp = temp + Program[c]
        print(temp)
        if temp in keyw:
            print(lineCount, "Keyword", temp)
            temp = ''
    print(Program[c])

Output:

while Program[c] != '\n':
IndexError: string index out of range
 S
S
 Sa
a
 Saa
a
 Saad
d
Process finished with exit code 1

Sample Input File:

// Saad
// Bhai

Solution

Besides answering your actual question, I would also like to give you some pointers on improving your Python code.

Your actual question: Your second line does not end with `\n`

The answer to your actual problem is that your file does not end with a newline \n.

Although it's a natural assumption is that every line does, there can be one exception: the last line of your file. When parsing the second line of your file, your while loop keeps searching for a \n character, but doesn't find any as your file simply ends after i.

You can confirm this by printing out all characters that are read:

>>> f = open('Program.C')
>>> print(list(f.read()))
['/', '/', ' ', 'S', 'a', 'a', 'd', '\n', '/', '/', ' ', 'B', 'h', 'a', 'i']
                                    ^^^^                                    ^^^^
                               endline here                         but not here!

So instead of finding the \n character your while loop is looking for, your variable c is incremented to beyond the length of your file input, causing the IndexError: string index out of range you encountered.

The simple fix would be to change your while loop to

while c < len(Program) and Program[c] != '\n':

Improving your Python

Naming conventions

Names starting with a capital letter are usually reserved for classes, so Program should be program. CamelCase is also usually avoided, so lineCount becomes line_count

Opening files in Python: `with open(file) as f:`

When you open a file yourself in Python, you should also close it. Because this is annoying, Python has the with-statement that automatically closes it once you leave the

with open(filename) as f:
    # file I/O

# file itself no longer needed

`for`-loops in Python

Any sequence-like type in Python has built-in iteration support. Instead of manually indexing, you can directly access the item you want. Compare for my_list = [1, 4, 9]:

for i in range(len(my_list)):
    print(my_list[i])

with

for item in my_list:
    print(item)

If you still need the index additionally, you can use enumerate:

for i, item in enumerate(my_list):
    print(i, item)

Iterating over files

Besides reading the file and iterating over every character in the string, Python also supports iterating over files in a line-by-line basis:

with open(filename) as file:
    # making use of enumerate()
    for line_num, line in enumerate(file, start=1):
        print(line_num, line)

My version

This is what I would make of the code you have posted, although as you get further into parsing, this may not be the best solution going forward (it probably won't be actually). It may still be a useful reference as a more 'pythonic' version of your posted code.

with open('program.C', 'r') as file:
    for line_count, line in enumerate(file, start=1):
        line = line.lstrip(' ')

        if line.startswith('//'):
            continue

        for character in line.rstrip('/n'):
            if character in br:
                print(line_count, "Brackers", character)
            else:
                temp += character
                print(temp)
                if temp in keywords:
                    print(line_count, "Keyword", temp)
                    temp = ''