Search code examples
pythonstringrandomnltkvocabulary

Generate a string of N random English words with NLTK/Python


Is there a way to generate a string of N random English words using NLTK/Python?

I am aware of NLTK's ability to generate sentences based on input text and a grammar, but I don't need to generate sentences based on any sort of grammar - I just need to randomly select N words from a given dictionary/vocabulary, and concatenate them into a string. I am also aware of ability to generate random strings of characters or how to use NLTK to generate "English-looking" nonsense words using n-grams, but I need the words to be actual English words from some dictionary file.

I tried doing this:

from nltk.corpus import words
from random import sample

n = 100
rand_words = ' '.join(sample(words, n))

But words is not an iterable so I can't use it this way. What is the correct way to create a random string of English words using NLTK's builtin dictionaries?


Solution

  • you just need to use the words() function corpus-structure

    rand_words = ' '.join(sample(words.words(), n))