Search code examples
pythonfileio

Generator always returning same value


I have a function that reads a file line by line and returns it as a list of the words. Since the file is very large, i would like to make it a generator.

Here is the function:

def tokenize_each_line(file):
   with open(file, 'r') as f:
      for line in f:
         yield line.split()

However, everytime i call next(tokenize_each_line()), it always returns the first line of the file. I guess this is not the expected behavior for generators. Instead, i'd like the function to return the next line.


Solution

  • Calling the function tokenize_each_line() returns a newly-initialized generator. So next(tokenize_each_line()) initializes a generator and makes it yield its first item (the first line of the file).

    Instead, initialize the generator, hold a reference to it, and call next on it according to your requirements.

    For example:

    gen = tokenize_each_line('myfile.txt')
    
    # just as an example of how you might want to use the generator
    words = []
    while len(words) < 1000:
        words += next(gen)