I have a function which return parts of speech of every word in the form of list of tuples. When I execute it, I only get the the result of first element(first tuple). I want to get the result of every element(tuple) in that list. For eg:
get_word_pos("I am watching")
I get the result of this as :
[('I', 'PRP'), ('am', 'VBP'), ('watching', 'VBG')]
But what I want the result is as follows
The function that I have written contains multiple return statement, that is the reason I am only getting the first element as output. Please if someone could modify my function so that I get the desired output. The code is as follows:
training = state_union.raw("2005-GWBush.txt")
tokenizer = nltk.tokenize.punkt.PunktSentenceTokenizer(training)
def get_word_pos(word):
sample = word
tokenized = tokenizer.tokenize(sample)
for i in tokenized:
words = nltk.word_tokenize(i)
tagged = nltk.pos_tag(words)
for letter in tagged:
if letter[1].startswith('J'):
return wordnet.ADJ
elif letter[1].startswith('V'):
return wordnet.VERB
elif letter[1].startswith('N'):
return wordnet.NOUN
elif letter[1].startswith('R'):
return wordnet.ADV
return wordnet.NOUN
As you iterate over tagged you return a value for the first item. You need to accumulate them. Appending them to a list would be one way of doing it. For example:
from nltk import word_tokenize, pos_tag
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer
from nltk.corpus import wordnet
training = state_union.raw('2005-GWBush.txt')
tokenizer = PunktSentenceTokenizer(training)
def get_word_pos(word):
result = []
for token in tokenizer.tokenize(word):
words = word_tokenize(token)
for t in pos_tag(words):
match t[1][0]:
case 'J':
case 'V':
case 'R':
case _:
return result
print(get_word_pos('I am watching'))
['n', 'v', 'v']