Forgive me if my wording is awful, but I'm trying to figure out how to determine the most used words in the English language from a set of words in a dictionary I've made. I've done some research on NLTK but can't seem to find a function within it (or any other library for that matter) that will help me do what I need to do.
For example: A sentence "I enjoy a cold glass of water on a hot day" would return "water" because it's the most used word in day to day conversation from the sentence. Essentially I need a returned value of the most frequently used word in conversations.
I figure I'll likely have to involve AI, but any time I've tried to use AI I wind up copy and pasting code because I just don't understand it, so I'm trying to avoid going that route
Any and all help is welcome and appreciated.
For context, I decided to start a project that would essentially guess a predetermined word based on characters the user says it has and doesn't have from the computers guess.
You need a external dataset for this task. You can try dataset such as google n gram dataset.
Here is the breakdown of the problem statement:
Output
: "water".Example: ["I", "enjoy", "a", "cold", "glass", "of", "water", "on", "a", "hot", "day"]
5000000
times{ "I": 5000000, "enjoy": 50000, "a": 10000000, "cold": 30000, "glass": 100000, "of": 8000000, "water": 1200000, "on": 6000000, "hot": 700000, "day": 400000 }
Note: You can try any big corpus as external data. using big corpus will have most of the English word which is used in conversation. And even if the frequency is not mentioned then you can create that yourself