I have some data that I have stored in a list and if I print out the list I see the following:
.
.
.
007 A000000 Y
007 B000000 5
007 C010100 1
007 C020100 ACORN FUND
007 C030100 N
007 C010200 2
007 C020200 ACORN INTERNATIONAL
007 C030200 N
007 C010300 3
007 C020300 ACORN USA
007 C030300 N
007 C010400 4
.
.
.
The dots before and after the sequence are to represent that there is other data that is similarily structured but might or might not not be part of this seventh item (007). if the first value in the seventh item is '007 A000000 Y' then I want to create a dictionary listing of some of the data items. I can do this and have done so by just running through all of the items in my list and comparing their values to some test values for the variables. For instance a line of code like:
if dataLine.find('007 B')==0:
numberOfSeries=int(dataLine.split()[2])
What I want to do though is
if dataLine.find(''007 A000000 Y')==0:
READ THE NEXT LINE RIGHT HERE
Right now I am having to iterate through the entire list for each cycle
I want to shorten the processing because I have about 60K files that have between 500 to 5,000 lines in each.
I have thought about creating another reference to the list and counting the datalines until dataLine.find(''007 A000000 Y')==0. But that does not seem like it is the most elegant solution.
Okay-while I was Googling to make sure I had covered my bases I came across a solution:
I find that I forget to think in Lists and Dictionaries even though I use them. Python has some powerful tools to work with these types to speed your ability to manipulate them.
I need a slice so the slice references are easily obtained by
beginPosit = tempans.index('007 A000000 Y')
endPosit = min([i for i, item in enumerate(tempans) if '008 ' in item])
where tempans is the datalist now I can write
for line in tempans[beginPosit:endPosit]:
process each line
I think I answered my own question. I learned alot from the other answers and appreciate them but I think this is what I needed
Okay I am going to further edit my answer. I have learned a lot here but some of this stuff is over my head still and I want to get some code written while I am learning more about this fantastic tool.
from itertools import takewhile
beginPosit = tempans.index('007 A000000 Y')
new=takewhile(lambda x: '007 ' in x, tempans[beginPosit:])
This is based on an earlier answer to a similar question and Steven Huwig's answer