The Intro: I'm Learning how to make my own language and steps the necessary to achieve that. I was trying to implement Lexical Analyzer but I'm getting an error even though my logic is right. I want the Program to not read the comment.
The Problem: I'm getting the Error "String Index out of range', when I try to Iterate the words and look for '\n' in the line of the comment.
Python Code:
comment = ['//', '/*', '*/']
keyw = ["main", "void"]
br = ['(', ')', '{', '}']
lineCount = 1
temp = ''
flag = False
f = open('Program.C', 'r')
Program = f.read()
#print(Program)
for c in range(len(Program)):
if Program[c] == ' ':
continue
if Program[c] == '\n':
lineCount = lineCount + 1
continue
if Program[c] == '/':
c = c + 1
if Program[c] == '/':
c = c + 1
while Program[c] != '\n':
c = c +1
if Program[c] in br:
print(lineCount, "Brackets", Program[c])
else:
temp = temp + Program[c]
print(temp)
if temp in keyw:
print(lineCount, "Keyword", temp)
temp = ''
print(Program[c])
Output:
while Program[c] != '\n':
IndexError: string index out of range
S
S
Sa
a
Saa
a
Saad
d
Process finished with exit code 1
Sample Input File:
// Saad
// Bhai
Besides answering your actual question, I would also like to give you some pointers on improving your Python code.
\n
The answer to your actual problem is that your file does not end with a newline \n
.
Although it's a natural assumption is that every line does, there can be one exception: the last line of your file. When parsing the second line of your file, your while
loop keeps searching for a \n
character, but doesn't find any as your file simply ends after i
.
You can confirm this by printing out all characters that are read:
>>> f = open('Program.C')
>>> print(list(f.read()))
['/', '/', ' ', 'S', 'a', 'a', 'd', '\n', '/', '/', ' ', 'B', 'h', 'a', 'i']
^^^^ ^^^^
endline here but not here!
So instead of finding the \n
character your while
loop is looking for, your variable c
is incremented to beyond the length of your file input, causing the IndexError: string index out of range
you encountered.
The simple fix would be to change your while loop to
while c < len(Program) and Program[c] != '\n':
Names starting with a capital letter are usually reserved for classes, so Program
should be program
. CamelCase is also usually avoided, so lineCount
becomes line_count
with open(file) as f:
When you open
a file yourself in Python, you should also close
it. Because this is annoying, Python has the with
-statement that automatically closes it once you leave the
with open(filename) as f:
# file I/O
# file itself no longer needed
for
-loops in PythonAny sequence-like type in Python has built-in iteration support. Instead of manually indexing, you can directly access the item you want. Compare for my_list = [1, 4, 9]
:
for i in range(len(my_list)):
print(my_list[i])
with
for item in my_list:
print(item)
If you still need the index additionally, you can use enumerate
:
for i, item in enumerate(my_list):
print(i, item)
Besides reading the file and iterating over every character in the string, Python also supports iterating over files in a line-by-line basis:
with open(filename) as file:
# making use of enumerate()
for line_num, line in enumerate(file, start=1):
print(line_num, line)
This is what I would make of the code you have posted, although as you get further into parsing, this may not be the best solution going forward (it probably won't be actually). It may still be a useful reference as a more 'pythonic' version of your posted code.
with open('program.C', 'r') as file:
for line_count, line in enumerate(file, start=1):
line = line.lstrip(' ')
if line.startswith('//'):
continue
for character in line.rstrip('/n'):
if character in br:
print(line_count, "Brackers", character)
else:
temp += character
print(temp)
if temp in keywords:
print(line_count, "Keyword", temp)
temp = ''