Search code examples
pythoncsverror-handlingstopiteration

Ignore StopIteration


I just read a bunch of posts on how to handle the StopIteration error in Python, I had trouble solving my particular example, though. Basically, I have a csv file with a lot of prefixes. This file has two columns with headers: Word and Count. Count is the frequency with which that prefix occurs. I also have another file with a list of company names. The prefix file acquired the prefixes from the first word of each company name in the company file. I'm trying to remove duplicates, and what I want to do right now is :

Ignore the StopIteration error every time this error would occur.

In order words, instead of having to write all the commented out "if" statements below, I just want one line that says: if a StopIteration error is generated, simply ignore the error is some way by treating the problematic "prefix" as if it were a prefix which occurs more than twice in the prefix file, such that we should return the value of the company name without the prefix included. I realize that this ignores the fact that there is a different prefix value in the prefix file and the actual prefix of the company name, but usually it has to do with non-American English letters stored differently between python and excel, and a few other ways that don't seem particularly systematic so I'll just remove them manually later.

My code is:

def remove_prefix(prefix, first_name):
   #try:
   #EXCEPTIONS:
   #if '(' in prefix:
   #    prefix = prefix[1:]
      #if ')' in prefix:
    #  prefix = prefix[:-1]
    """
      if prefix == "2-10":
        prefix = "2"
      if prefix == "4:2:2":
        prefix = "4"
      if prefix == "5/0" or prefix == "5/7" or prefix == "58921-":
        prefix = "5"
    """
    #except StopIteration:
   #    pass

   print(first_name, prefix)
   input_fields = ('Word', 'Count')
   reader = csv.DictReader(infile1, fieldnames = input_fields)
   #if the prefix has a frequency of x >=2 in the prefix file, then return first_name  without prefix
   #else, return first_Name
   infile1.seek(0)
   #print(infile1.seek(0))
   next(reader)
   first_row = next(reader)
   while prefix != first_row['Word'] and prefix[1:]!= first_row['Word']:
      first_row = next(reader)
      #print(first_name, prefix)
      #print(first_row, first_name, prefix, '\t' + first_row['Word'], prefix[1:])
   if first_row['Count'] >= 2:
      length = len(prefix)
      first_name = first_name[length+1:]
  #print("first name is ", first_name)
  return first_name

Solution

  • This could be done in a much easier way by creating a list of prefixes from the file first, and then using the startswith method on each. For example:

    reader = csv.DictReader(infile1)
    # # this is assuming there are only two columns in the file: Word and Count
    prefixes = [l["Word"] for l in list(reader) if int(l["Count"]) >= 2]
    
    def remove_prefix(first_name):
        for p in prefixes:
            if first_name.startswith(p):
                return first_name[len(p):]
        return first_name
    

    Wouldn't that be simpler? Another advantage is that it reads the file only once, instead of reopening it for every word it wants to replace.