Search code examples
python-3.xnlpgensimdoc2vecsentence-similarity

How to extract sentences which has similar meaning/intent compared against a example list of sentences


I have chat interaction [Utterances] between Customer and Advisor and would want to know if the advisor interactions contains particular sentences or similar sentences in the below list:

Example sentences i am looking for in the Advisor interactions

["I would be more than happy to help you with this",
"I would be happy to look over the account to see how I can help get this sorted out for you",
"I’d be more than happy to look into this for you!",
"Oh, I see, let me assist you with this concern.",
"I am more than happy to do everything I can to resolve this matter for you.",
"I would be happy to look over the account to see how I can help get this sorted out for you.",
"I am happy to have a look."]


I have a dataset which contains the list of interaction_id and Utterances(Sample below)

```Example Chat interaction between Advisor and CLient : 
Client : Hello I would like to place an order for replacement battery
Agent: Hi Welcome to Battery service department. I would be happy to help you with your battery replacement Order.

How do get/Extract the sentences with similar intent or meaning. I am newbie to NLP and i believe I have a sentences classification/Extraction problem in hand and would like to know is there any way i can achieve what i need

Basically I am trying to achieve the below:

ID    Utt                                               Help_Stmt_Present

IRJST   Hi Welcome to Battery service department. 
        I would be happy to help you with your battery
        replacement Order.                                     Yes 



Solution

  • There could be multiple ways for doing this in multiple steps:
    1. Calculating sentence vectors

    a. Using pretrained word embeddings(glove, word2vec, fasttext, etc) and calculating word embeddings for each word and then average it across words of the sentence to calculate the sentence embedding.

    b. Use Universal Sentence Encoder to get the sentence embeddings.

    2. Calculate similarity match

    a. Calculate the distance between between the target and all other N sentences using euclidean or cosine or any other distance metric that works best for your problem.

    b. Train a KNN model with N sentence vectors you have and apply K-NN prediction with the target sentence to get K most similar sentences.

    To get even better results you can use deep learning based techniques and SOTA architectures such as transformers and the architectures built over it. You can checkout this repository which solves your task using transformers. Also to play with different architectures and other NLP tasks you can checkout the Hugging Face Repository