I am parsing through an ISI file with a few hundred records that all begin with a 'PT J
' tag and end with an 'ER
' tag. I am trying to pull the tagged info from each record within a nested loop but keep getting an IndexError. I know why I am getting it, but does anyone have a better way of identifying the start of new records than checking the first few characters?
while file:
while line[1] + line[2] + line[3] + line[4] != 'PT J':
...
Search through and record data from tags
...
I am using this same method and therefore occasionally getting the same problem with identifying tags, so if you have any suggestions for that as well I would greatly appreciate it!
Sample data, which you'll notice does not always include every tag for each record, is:
PT J
AF Bob Smith
TI Python For Dummies
DT July 4, 2012
ER
PT J
TI Django for Dummies
DT 4/14/2012
ER
PT J
AF Jim Brown
TI StackOverflow
ER
Do the 'ER'
lines only contain 'ER'? That would be why you're getting IndexError
s, because line[4] doesn't exist.
The first thing to to try would be:
while not line.startswith('PT J'):
instead of your existing while loop.
Also, slices:
line[1] + line[2] + line[3] + line[4] == line[1:5]
(The ends of slices are noninclusive)