I am building a python text classification application. In the app the user provides a small sentence (or a single word) and we classify his sentence. The problem I'm facing is to find a way to check if his string format a word or a group of words.
Examples of users inputs:
1) "asdfasdfa"
2) "This is adsfgafdga"
The example 1 is not a word so I want to raise an Error, also the example 2 contains a non-word string in it so I want to raise an Error too.
Correct Examples:
1) "Hello"
2) "This is good"
Is there a way to do that without a list of words or someone know an API to do that?
One extensive method is to create a list and store the dictionary words in it. First perform a split on the user input to singularly extract each word off a phrase using a phrase.split()
.
words = phrase.split()
// words : ['This', 'is', 'good']
len(words)
// number of words : 3
Run a loop according to the number of words in the phrase if the result is greater than 1. And then its a mere matter of checking whether the word is present in the list using the following.
if "word" in dictionary_words:
print "Word is available"
There's a neat XML version of the dictionary words you can use instead of the list.
For a more sophisticated solution, you can try incorporating an API like PyEnchant that provisions a spell checking library. For further details in this regard, you can check it out and do a pip install pyenchant
and import it.
>>> import enchant
>>> help(enchant)