I have an generator (a function that yields stuff), but when trying to pass it to gensim.Word2Vec
I get the following error:
TypeError: You can't pass a generator as the sentences argument. Try an iterator.
Isn't a generator a kind of iterator? If not, how do I make an iterator from it?
Looking at the library code, it seems to simply iterate over sentences like for x in enumerate(sentences)
, which works just fine with my generator. What is causing the error then?
Generator is exhausted after one loop over it. Word2vec simply needs to traverse sentences multiple times (and probably get item for a given index, which is not possible for generators which are just a kind of stacks where you can only pop), thus requiring something more solid, like a list.
In particular in their code they call two different functions, both iterate over sentences (thus if you use generator, the second one would run on an empty set)
self.build_vocab(sentences, trim_rule=trim_rule)
self.train(sentences)
It should work with anything implementing __iter__
which is not GeneratorType
. So wrap your function in an iterable interface and make sure that you can traverse it multiple times, meaning that
sentences = your_code
for s in sentences:
print s
for s in sentences:
print s
prints your collection twice