Search code examples
pythonstring-parsing

Python file parsing -> IndexError


I am parsing through an ISI file with a few hundred records that all begin with a 'PT J' tag and end with an 'ER' tag. I am trying to pull the tagged info from each record within a nested loop but keep getting an IndexError. I know why I am getting it, but does anyone have a better way of identifying the start of new records than checking the first few characters?

    while file:
        while line[1] + line[2] + line[3] + line[4] != 'PT J':
            ...                
            Search through and record data from tags
            ...

I am using this same method and therefore occasionally getting the same problem with identifying tags, so if you have any suggestions for that as well I would greatly appreciate it!

Sample data, which you'll notice does not always include every tag for each record, is:

    PT J
    AF Bob Smith
    TI Python For Dummies
    DT July 4, 2012
    ER

    PT J
    TI Django for Dummies
    DT 4/14/2012
    ER

    PT J
    AF Jim Brown
    TI StackOverflow
    ER

Solution

  • Do the 'ER' lines only contain 'ER'? That would be why you're getting IndexErrors, because line[4] doesn't exist.

    The first thing to to try would be:

    while not line.startswith('PT J'):
    

    instead of your existing while loop.

    Also, slices:

    line[1] + line[2] + line[3] + line[4] == line[1:5] 
    

    (The ends of slices are noninclusive)