i want to have phrases in doc2vec and i use gensim.phrases. in doc2vec we need tagged document to train the model and i cannot tag the phrases. how i can do this?
here is my code
text = phrases.Phrases(text)
for i in range(len(text)):
string1 = "SENT_" + str(i)
sentence = doc2vec.LabeledSentence(tags=string1, words=text[i])
text[i]=sentence
print "Training model..."
model = Doc2Vec(text, workers=num_workers, \
size=num_features, min_count = min_word_count, \
window = context, sample = downsampling)
The invocation of Phrases()
trains a phrase-creating-model. You later use that model on text to get back phrase-combined text.
Don't replace your original text
with the trained model, as on your code's first line. Also, don't try to assign into the Phrases model, as happens in your current loop, nor access the Phrases model by integers.
The gensim docs for the Phrases class has examples of the proper use of the Phrases
class; if you follow that pattern you'll do well.
Further, note that LabeledSentence
has been replaced by TaggedDocument
, and its tags
argument should be a list-of-tags. If you provide a string, it will see that as a list-of-one-character tags (instead of the one tag you intend).