I have a function that reads a file line by line and returns it as a list of the words. Since the file is very large, i would like to make it a generator.
Here is the function:
def tokenize_each_line(file):
with open(file, 'r') as f:
for line in f:
yield line.split()
However, everytime i call next(tokenize_each_line())
, it always returns the first line of the file. I guess this is not the expected behavior for generators. Instead, i'd like the function to return the next line.
Calling the function tokenize_each_line()
returns a newly-initialized generator. So next(tokenize_each_line())
initializes a generator and makes it yield its first item (the first line of the file).
Instead, initialize the generator, hold a reference to it, and call next
on it according to your requirements.
For example:
gen = tokenize_each_line('myfile.txt')
# just as an example of how you might want to use the generator
words = []
while len(words) < 1000:
words += next(gen)