I have created word embeddings (Word2vec) using my own dataset. I have used Gensim module to create word embeddings. I want to evaluate my word embeddings.
I have used Wordsim353 dataset to evaluate word embeddings. The following code Shows the result of Evaluation.
Code:
from gensim.test.utils import datapath
similarities = model.wv.evaluate_word_pairs(datapath('wordsim353.tsv'))
print(similarities)
Result:
((0.09410256722489568, 0.3086953732794174), SpearmanrResult(correlation=0.06101508426787973, pvalue=0.5097769955392246), 66.28895184135978)
How can I interprete the result?
Please help me to interprete the results.
The way we evaluate the quality of word embeddings is to see how closely the similarities computed by embeddings match the actual similarities assigned by human judgements.
Your Pearson and Spearmanr's pValue are too high with approximately 0.3 (70%) and 0.5 (50%). I suggest you should use pretrained word embeddings or collect more dataset.
I have strived to evaluate with glove-twitter-25 and received very great pvalue.
import gensim.downloader as api
from gensim.test.utils import datapath
m = api.load("glove-twitter-25")
m.evaluate_word_pairs(datapath("wordsim353.tsv"))
output:
((0.36409317297819943, pvalue=2.969053896450154e-12), SpearmanrResult(correlation=0.36452011505868487, pvalue=2.788781738485533e-12), 2.26628895184136)