My problem: I want to check if the provided word is a common English word. I'm using pyenchant currently to see if a word is an actual word but I can't find a function in it that returns the frequency of a word/if it's a common word.
Example code:
import enchant
eng_dict = enchant.Dict("en_US")
words = ['hello', 'world', 'thisisntaword', 'anachronism']
good_words = []
for word in words:
if eng_dict.check(word): # currently this checks if it's an english word, but I also want it to check it it's commonly used word
good_words.append(word)
print(good_words)
What it returns as is: ['hello', 'world', 'anachronism']
. What I want it to return:['hello', 'world']
because anachronism is obviously not a common word.
Any solutions my problem?
You could use the Google Ngram API for this:
url = "https://books.google.com/ngrams/json"
query_params = {
"content": <my_noun_phrase/string of noun phrases>,
"year_start": 2017,
"year_end": 2019,
"corpus": 26,
"smoothing": 1,
"case_insensitive": True
}
response = requests.get(url=url, params=query_params)
This API lets you access v3 of the Google ngram database, which is the most recent version available. Note, however, that the API is not officially documented, and since you run into rate limits quite easily, it's not production-proof. Alternative tools are PhraseFinder (https://phrasefinder.io/) and NGRAMS (https://ngrams.dev/). PhraseFinder is a wrapper around v2 of the Google ngram database; NGRAM is a wrapper around v3 of the same database. They are both free and can handle more traffic than the Google API.