I just read a bunch of posts on how to handle the StopIteration error in Python, I had trouble solving my particular example, though. Basically, I have a csv file with a lot of prefixes. This file has two columns with headers: Word and Count. Count is the frequency with which that prefix occurs. I also have another file with a list of company names. The prefix file acquired the prefixes from the first word of each company name in the company file. I'm trying to remove duplicates, and what I want to do right now is :
Ignore the StopIteration error every time this error would occur.
In order words, instead of having to write all the commented out "if" statements below, I just want one line that says: if a StopIteration error is generated, simply ignore the error is some way by treating the problematic "prefix" as if it were a prefix which occurs more than twice in the prefix file, such that we should return the value of the company name without the prefix included. I realize that this ignores the fact that there is a different prefix value in the prefix file and the actual prefix of the company name, but usually it has to do with non-American English letters stored differently between python and excel, and a few other ways that don't seem particularly systematic so I'll just remove them manually later.
My code is:
def remove_prefix(prefix, first_name):
#try:
#EXCEPTIONS:
#if '(' in prefix:
# prefix = prefix[1:]
#if ')' in prefix:
# prefix = prefix[:-1]
"""
if prefix == "2-10":
prefix = "2"
if prefix == "4:2:2":
prefix = "4"
if prefix == "5/0" or prefix == "5/7" or prefix == "58921-":
prefix = "5"
"""
#except StopIteration:
# pass
print(first_name, prefix)
input_fields = ('Word', 'Count')
reader = csv.DictReader(infile1, fieldnames = input_fields)
#if the prefix has a frequency of x >=2 in the prefix file, then return first_name without prefix
#else, return first_Name
infile1.seek(0)
#print(infile1.seek(0))
next(reader)
first_row = next(reader)
while prefix != first_row['Word'] and prefix[1:]!= first_row['Word']:
first_row = next(reader)
#print(first_name, prefix)
#print(first_row, first_name, prefix, '\t' + first_row['Word'], prefix[1:])
if first_row['Count'] >= 2:
length = len(prefix)
first_name = first_name[length+1:]
#print("first name is ", first_name)
return first_name
This could be done in a much easier way by creating a list of prefixes from the file first, and then using the startswith
method on each. For example:
reader = csv.DictReader(infile1)
# # this is assuming there are only two columns in the file: Word and Count
prefixes = [l["Word"] for l in list(reader) if int(l["Count"]) >= 2]
def remove_prefix(first_name):
for p in prefixes:
if first_name.startswith(p):
return first_name[len(p):]
return first_name
Wouldn't that be simpler? Another advantage is that it reads the file only once, instead of reopening it for every word it wants to replace.