I have used the gensim Word2Vec model and applied it in my list of documents. Well , the word embedding is getting created. I want to know if Word2Vec is performing well on my list of documents. Is there any metrics to measure that? How will I understand if Word2Vec has really worked well on my document corpus or should I try some different embedding? Below is the code I have used from gensim.
import gensim
model = gensim.models.Word2Vec(documents , size=150, window=10, min_count=2, sg=1, workers=10)
There's no universal definition of "performing well". It depends on your end-goals.
Why do you want to create word-vectors? What value do you expect them to provide?
With the answer to those questions, you can 1st review the results in an informal, ad-hoc fashion: look at some of the words nearest-neighbors (the results of wordvecs.most_similar(query_word)
) to see if they make sense to you, for your needs and problem-domain.
But to really test whether your models are doing better over time, as you improve your data or model-parameters, you should form some repeatable, quantitative tests that match your end-goal. (For example: do you need certain pairs of words to be closer to each other than to some third word? Do you use the word-vectors as input to some other classification or info-retrieval process that has some known,desirable results?)
Run those tests, to score the model, then compare one model's score against another.