nlp semantics linguistics semantic-analysis

The distance between the meaning of two sentences

I am looking for a way to measure the semantic distance between two sentences. Suppose we have the following sentences:

(S1) The beautiful cherry blossoms in Japan. 
(S2) The beautiful Japan.

S2 is created from S1 by eliminating the words "cherry", "blossoms" and "in". I want to define a function that gives a high distance between S1 and S2. The reason for this is that they do have significantly different meaning, since beautiful modifies cherry blossoms and not Japan.

Solution

I think that research has made a lot of advances in that area and now the distance between the meaning of sentences can be calculated via several methods thanks to the development of word vectors and transformers:

Google universal sentence encoder (USE): https://tfhub.dev/google/universal-sentence-encoder/2
Infersent by facebook: https://github.com/facebookresearch/InferSent
Averaging the word vectors (with cosine similarity).
Spacy also provide a similarity between two sentences based on word vectors: https://spacy.io/usage/spacy-101
ELMo: https://github.com/HIT-SCIR/ELMoForManyLangs
Bert: https://github.com/google-research/bert
ALBERT: https://github.com/google-research/ALBERT
RoBERTa: https://ai.facebook.com/blog/roberta-an-optimized-method-for-pretraining-self-supervised-nlp-systems/
XLNET: https://github.com/zihangdai/xlnet
ELECTRA: https://github.com/google-research/electra

etc