Search code examples
pythonspacyentity-linking

Spacy - entity linker - why is the predict score a combination of prob and cosine sim?


I was going through the predict method for the entity linker pipe under spacy, and for some reason the score is defined as the following :

scores = prior_probs + sims - (prior_probs*sims)

Link here

Anybody has experience with this / knows where this formula comes from?

Thanks!


Solution

  • It is taken from Entity Linking via Joint Encoding of Types, Descriptions, and Context section 4 equation 2.

    I don't feel confident enough though in explaining the formula in detail, on overall the purpose is to combine probability scores for entitiy candidates derived from external knowledge based resources (KB in the paper), which are the prior probabilities, and scores estimated with a sentence encoder, used to encode the mention to link along with its context, sims in the formula because they compute cosine similarity between the encoded mention vector and all entity candidates (which is why this formula is used only if "incl_context" is true).