Several months ago, I used "pseudocorpus" to create a fake corpus as part of phrase training using Gensim with the following code:
from gensim.models.phrases import pseudocorpus
corpus = pseudocorpus(bigram_model.vocab, bigram_model.delimiter, bigram_model.common_terms)
bigrams = []
for bigram, score in bigram_model.export_phrases(corpus, bigram_model.delimiter, as_tuples=False):
if score >= bigram_model.threshold:
bigrams.append(bigram.decode('utf-8'))
Now when I run the code, I got the following error message:
ImportError: cannot import name 'pseudocorpus' from 'gensim.models.phrases'
I'm using Gensim 4.2.0. Is pseudocorpus() no longer available with Gensim 4.2.0?
Thanks a lot!
I believe the main internal consumer of a pseudocorpus()
result, the .export_phrases()
method, was improved to achieve the same goals more efficiently, so that method disappeared – as it hadn't really been promoted as part of the public functionality of the module.
Can you make use of .export_phrases()
for your purposes?
If not, can you say a bit more about how you were using the (odd synthetic) 'pseudocorpus'?
If all else fails, the prior functionality was a pretty simple extraction from the model's state, and you can view the last version of the function before it was refactored-away at the project's open source repository:
So, you could simply use that as a guide to reimplementing equivalent extraction in your own code.