Search code examples
python-3.xnlpkatacoda

POS Tagging in NLP


I am doing a course in NLTK Python which has a hands-on problem(on Katacoda) on "Text Corpora" and it is not accepting my solution mentioned below. Have been stuck on this problem since long. Need to complete this hands-on to proceed foreword in course.

Problem Defenition

  1. Import the text corpus brown.
  2. Extract the list of tagged words from the corpus brown. Store the result in brown_tagged_words

  3. Generate trigrams of brown_tagged_words and store the result in brown_tagged_trigrams.

4.For every trigram of brown_tagged_trigrams, determine the tags associated with each word. This results in a list of tuples, where each tuple contain pos tags of 3 consecutive words, occurring in text. Store the result in brown_trigram_pos_tags.

5.Determine the frequency distribution of brown_trigram_pos_tags and store the result in brown_trigram_pos_tags_freq. 6.Print the number of occurrences of trigram ('JJ','NN','IN')

For this I have tried below solution:
import nltk
from nltk.corpus import brown
brown_tagged_words = [w for w in brown.tagged_words()]
brown_tagged_trigrams = nltk.trigrams(brown_tagged_words)
brown_trigram_pos_tags = [(w1[1],w2[1],w2[1]) for w1,w2,w3 in brown_tagged_trigrams]
brown_trigram_pos_tags_freq = nltk.FreqDist(brown_trigram_pos_tags)
print(brown_trigram_pos_tags_freq[('JJ', 'NN', 'IN')])

Solution

  • brown_trigram_pos_tags = [(w1[1],w2[1],w3[1]) for w1,w2,w3 in brown_tagged_trigrams]

    Here change W2 to w3,this will give value around 8