Search code examples
machine-learningnlpclassificationsupervised-learning

Categorize customer questions based on content


I’m working on web app where users can ask questions. These questions should be categorized by some criteria based on question content, title, user data, region and so on. Next these questions should be processed in so way: for some additional information requests should be sent, others should be deleted or marked as spam and some – sent directly to some specialist.

The problem is that users can’t choose the right category themselves, it’s pretty complex things and users can cheat.

Are there any approaches how to do that automatically? For now a few persons do this job filtering questions. Perhaps some already done solutions exist.


Solution

  • This is a really complex task. You should take a look at supervised machine learning classification algorithms. You can try to use similar to some spam filtering algorithm (https://en.wikipedia.org/wiki/Naive_Bayes_spam_filtering)

    1. Gather some number of questions categorized before (labeled examples).
    2. Gather some number of words (vocabulary) used for questions classifications (identify group).
    3. Process question text removing “stop words” and replace words with their stems.
    4. Map question text, title, user data and so to some numbers (question vector).
    5. Use some algorithm like SVM to create and use classifier (model)

    But it’s like very general approach you can look at. It’s hard to say something more specific without additional details. I don’t think you can find already done solution, it’s pretty specific task. But of cause you can use a lot of machine-learning frameworks.