python machine-learning nlp artificial-intelligence nltk

NLTK. Detecting whether a sentence is Interrogative or Not?

I want to create a python script using NLTK or whatever library is best to correctly identify given sentence is interrogative (a question) or not. I tried using regex but there are deeper scenarios where regex fails. so wanted to use Natural Language Processing can anybody help!

Solution

This will probably solve your question.

Here is the code:

import nltk
nltk.download('nps_chat')
posts = nltk.corpus.nps_chat.xml_posts()[:10000]


def dialogue_act_features(post):
    features = {}
    for word in nltk.word_tokenize(post):
        features['contains({})'.format(word.lower())] = True
    return features

featuresets = [(dialogue_act_features(post.text), post.get('class')) for post in posts]
size = int(len(featuresets) * 0.1)
train_set, test_set = featuresets[size:], featuresets[:size]
classifier = nltk.NaiveBayesClassifier.train(train_set)
print(nltk.classify.accuracy(classifier, test_set))

And that should print something like 0.67, which is decent accuracy. If you want to process a string of text through this classifier, try:

print(classifier.classify(dialogue_act_features(line)))

And you can categorise strings into whether they are ynQuestion, Statement, etc, and extract what you desire.

This approach was using NaiveBayes which in my opinion is the easiest, however surely there are many ways to process this. Hope this helps!