How to find most frequent noun following the word 'the'?

from nltk.corpus import brown

tagged = brown.tagged_words(tagset='universal')

I understand that to find the most frequent word following 'the' is done like so

cfd3 = nltk.ConditionalFreqDist(nltk.bigrams(brown.words())

cfd3['the'].max()

however, how would one go about finding the most frequent noun following the word 'the'

Solution

Make a FreqDist that counts only the nouns that follow the word "the".

The Brown corpus has a very rich tagset, so let's simplify things by asking for the simplified "universal" tagset. All nouns are now tagged "NOUN".

>>> noundist = nltk.FreqDist(w2 for ((w1, t1), (w2, t2)) in 
            nltk.bigrams(brown.tagged_words(tagset="universal"))
            if w1.lower() == "the" and t2 == "NOUN")
>>> noundist.most_common(10)
[('world', 346), ('time', 250), ('way', 236), ('end', 206), ('fact', 194), ('state', 190), 
('man', 176), ('door', 172), ('house', 152), ('city', 127)]

The comprehension unpacks the two word, tag tuples that form the bigram: (w1, t1), (w2, t2); checks that the first word (lowercased) is "the" and the second is tagged "NOUN"; and if so, passes the second word (so, w2 only) to be counted by the FreqDist.