Search code examples
nlpartificial-intelligence

How to use natural language processing to map text to a preset list of topics


I'm interested in being able to use a service such as Google's Natural Language API to classify random user questions into a preset list of topics. I have an advanced level of programming experience, and want to use Google's service as a base and if necessary build a codebase around it to accomplish our goal. An example use case would be:

Hardcoded preset list of topics:
Baseball
Football
Soccer

Sample user questions and expected results:
How do I cook pasta? RESULT: No results
What is a referee? RESULT: Baseball/Football/Soccer
What is a home run? RESULT: Baseball

1) Does anything like this already exist to classify random user text into preset list of topics?
2) If not, is there a programming concept that already exists that shows ways to implement this, or allow me to learn the concepts around this? (I searched on Google and couldn't find anything -- I may simply not know what to look for)
3) If not, any guidance on this could be implemented?


Solution

  • This sounds like a basic classification problem, or more specifically maybe intent classification.

    Google has a guide to creating a classification program. You should start with that.

    The output of the classifier there will give you a list of topics with a probability for each. If you want to allow multiple probabilities, which is harder to get right, you can take all topics with a probability above a threshold. You'll have to determine the threshold with experimentation.

    The default model has a fixed list of categories, but this guide walks you through setting up custom categories.


    If you are willing to look outside Google Cloud it may be easier to find guides to text classification. spaCy has an excellent guide you can use to get rolling quickly.