Search code examples
pythonnlpgenerator

generator runs only once but why does this generator could run multiple times?


Hi I'm trying to wrap my head around the concept of generator in Python specifically using Spacy.

As far as I understood, generator runs only once. and nlp.pipe(list) returns a generator to use machine effectively.

And the generator worked as I predicted like below.

matches = ['one', 'two', 'three']
docs = nlp.pipe(matches)
type(docs)

for x in docs:
    print(x)
# First iteration, worked
one
two
three

for x in docs:
    print(x)
#   Nothing is printed this time

But strange thing happened when I tried to make a list using the generator

for things in nlp.pipe(example1):
    print(things)

    #First iteration prints things

    a is something
    b is other thing
    c is new thing
    d is extra

for things in nlp.pipe(example1):
    print(things)
    #Second iteration prints things again!

    a is something
    b is other thing
    c is new thing
    d is extra

Why this generator runs infinitely? I tried several times and it seems like it runs infinitely.

Thank you


Solution

  • I think you're confused because the term "generator" can be used to mean two different things in Python.

    The first thing it can mean is a "generator object" which kind of iterator. The docs variable you created in your first example is a reference to one of these. A generator object can only be iterated once, after that it's exhausted and you'll need to create another one if you want to do more iteration.

    The other thing "generator" can mean is a "generator function". A generator function is a function that returns a generator object when you call it. Indeed, the term "generator" is sometimes sloppily used for functions that return iterators generally, even when that's not technically correct. A real generator function is implemented using the yield keyword, but from the caller's perspective, it doesn't really matter how the function is implemented, just that it returns some kind of iterator.

    I don't know anything about the library you're using, but it seems like nlp.pipe returns an iterator, so in at least the loosest sense (at least) it can be called a generator function. The iterator it returns is (presumably) the generator object.

    Generator objects are single-use, like all iterators are supposed to be. Generator functions on the other hand, can be called as many times as you find appropriate (some might have side effects). Each time you call the generator function, you'll get a new generator object. This is why your second code block works, as you're calling nlp.pipe once for each loop, rather than iterating on the same iterator for both loops.