Search code examples
machine-learningnlpdeep-learningartificial-intelligencesummarization

Using NLP or machine learning to extract keywords off a sentence


I'm new to the ML/NLP field so my question is what technology would be most appropriate to achieve the following goal:

We have a short sentence - "Where to go for dinner?" or "What's your favorite bar?" or "What's your favorite cheap bar?"

Is there a technology that would enable me to train it providing the following data sets:

  • "Where to go for dinner?" -> Dinner
  • "What's your favorite bar?" -> Bar
  • "What's your favorite cheap restaurant?" -> Cheap, Restaurant

so that next time we have a similar question about an unknown activity, say, "What is your favorite expensive [whatever]" it would be able to extract "expensive" and [whatever]?

The goal is if we can train it with hundreds of variations(or thousands) of the question asked and relevant output data expected, so that it can work with everyday language.

I know how to make it even without NLP/ML if we have a dictionary of expected terms like Bar, Restaurant, Pool, etc., but we also want it to work with unknown terms.

I've seen examples with Rake and Scikit-learn for classification of "things", but I'm not sure how would I feed text into those and all those examples had predefined outputs for training.

I've also tried Google's NLP API, Amazon Lex and Wit to see how good they are at extracting entities, but the results are disappointing to say the least.

Reading about summarization techniques, I'm left with the impression it won't work with small, single-sentence texts, so I haven't delved into it.


Solution

  • As @polm23 mentioned for simple stuff you can use the POS tagging to do the extraction. The services you mentioned like LUIS, Dialog flow etc. , uses what is called Natural Language Understanding. They make uses of intents & entities(detailed explanation with examples you can find here). If you are concerned that your data is going online or sometimes you have to go offline, you always go for RASA.

    Things you can do with RASA:

    • Entity extraction and sentence classification. Mention which particular term to be extracted from the sentence by tagging the word position with a variety of sentence. So if any different word comes other than what you had given in the training set it will be detected.
    • Uses rule-based learning and also keras LSTM for detection.
    • One downside when comparing with the online services is that you have to manually tag the position numbers in the JSON file for training as opposed to the click and tag features in the online services.

    You can find the tutorial here.

    I am having pain in my leg.

    Eg I have trained RASA with a variety of sentences for identifying body part and symptom (I have limited to 2 entities only, you can add more), then when an unknown sentence (like the one above) appears it will correctly identify "pain" as "symptom" and "leg" as "body part".

    Hope this answers your question!