What Metrics Are Used in the Output of Gensim's evaluate_word_pairs?

Gensim offers evaluate_word_pairs function for evaluating semantic similarity.

Here is an example from its page:

model.wv.evaluate_word_pairs(datapath('wordsim353.tsv'))

Out:
((0.1014236962315867, 0.44065378924434523), SpearmanrResult(correlation=0.07441989763914543, pvalue=0.5719973648460552), 83.0028328611898)

I would like to know what metrics are used to generate each value(0.1014236962315867, 0.44065378924434523,...) in the output?

Solution

Per the documentation for evaluate_word_pairs():

Returns

pearson (tuple of (float, float)) – Pearson correlation coefficient with 2-tailed p-value.

spearman (tuple of (float, float)) – Spearman rank-order correlation coefficient between the similarities from the dataset and the similarities produced by the model itself, with 2-tailed p-value.

oov_ratio (float) – The ratio of pairs with unknown words.

Per your output, it looks like the Pearson result is still just a plain tuple, while the Spearman result has been reported as a named tuple. But in each case, it appears the correlation-coefficient is 1st, then the p-value.

Note that the oov_ratio is reported as the percentage of test words that weren't known to the model.

Consult other references for definitions/explanations of the Pearson and Spearman coefficients/p-values.