Search code examples
pythongensimword2vec

Generator is not an iterator?


I have an generator (a function that yields stuff), but when trying to pass it to gensim.Word2Vec I get the following error:

TypeError: You can't pass a generator as the sentences argument. Try an iterator.

Isn't a generator a kind of iterator? If not, how do I make an iterator from it?

Looking at the library code, it seems to simply iterate over sentences like for x in enumerate(sentences), which works just fine with my generator. What is causing the error then?


Solution

  • Generator is exhausted after one loop over it. Word2vec simply needs to traverse sentences multiple times (and probably get item for a given index, which is not possible for generators which are just a kind of stacks where you can only pop), thus requiring something more solid, like a list.

    In particular in their code they call two different functions, both iterate over sentences (thus if you use generator, the second one would run on an empty set)

    self.build_vocab(sentences, trim_rule=trim_rule)
    self.train(sentences)
    

    It should work with anything implementing __iter__ which is not GeneratorType. So wrap your function in an iterable interface and make sure that you can traverse it multiple times, meaning that

    sentences = your_code
    for s in sentences:
      print s
    for s in sentences:
      print s
    

    prints your collection twice