I'm trying to iterate over a text file (containing several stories) and return a list of lists where each list is a new story.
read_lines_in_text(fname) is a generator that I want to iterate over to read each line in the text file. This must remain a generator.
find_title(fname) is a function that must be used and returns a list of the lines in the text where a title appears (and therefore signals the start of a new story).
The code I have written below does the job, but I think it is not a great solution.
newdict = {}
story = []
list_of_stories = []
for idx, line in enumerate(read_lines_in_text(fname)):
if line in find_title(fname):
newdict[idx] = line
for idx, line in enumerate(read_lines_in_text(fname)):
if idx >= list(newdict.keys())[0]:
if idx in newdict:
list_of_stories.append(story)
story = []
story.append(line)
else:
story.append(line)
Given than I have the indexes of where each title occurs in the text, I want to have something like the following:
for lines between key i and key i+1 in mydict:
append to story
list_of_stories.append(story)
story = []
You do not need to use indices at all. Just start a new story
list whenever you have a new title, and append the previous one to list_of_stories
:
story = []
list_of_stories = []
titles = set(find_title(fname))
for line in read_lines_in_text(fname):
if line in titles:
# start a new story, append the previous
if story:
list_of_stories.append(story)
story = [line]
elif story: # a story has been started
story.append(line)
# handle the last story
if story:
list_of_stories.append(story)
When using a generator function, you really want to avoid treating it as a random access sequence with index numbers.
Note that we also avoid reading fname
more than once just to get the titles; the titles
variable is a set of title strings returned by find_title()
, stored as a set for fast membership testing.