Search code examples
nlpsemanticslinguisticssemantic-analysis

The distance between the meaning of two sentences


I am looking for a way to measure the semantic distance between two sentences. Suppose we have the following sentences:

(S1) The beautiful cherry blossoms in Japan. 
(S2) The beautiful Japan.

S2 is created from S1 by eliminating the words "cherry", "blossoms" and "in". I want to define a function that gives a high distance between S1 and S2. The reason for this is that they do have significantly different meaning, since beautiful modifies cherry blossoms and not Japan.


Solution

  • I think that research has made a lot of advances in that area and now the distance between the meaning of sentences can be calculated via several methods thanks to the development of word vectors and transformers:

    1. Google universal sentence encoder (USE): https://tfhub.dev/google/universal-sentence-encoder/2

    2. Infersent by facebook: https://github.com/facebookresearch/InferSent

    3. Averaging the word vectors (with cosine similarity).

    4. Spacy also provide a similarity between two sentences based on word vectors: https://spacy.io/usage/spacy-101

    5. ELMo: https://github.com/HIT-SCIR/ELMoForManyLangs

    6. Bert: https://github.com/google-research/bert

    7. ALBERT: https://github.com/google-research/ALBERT

    8. RoBERTa: https://ai.facebook.com/blog/roberta-an-optimized-method-for-pretraining-self-supervised-nlp-systems/

    9. XLNET: https://github.com/zihangdai/xlnet

    10. ELECTRA: https://github.com/google-research/electra

    etc