How to remove part of speech tags from the results of chunking ? I am using NLTK to do this. Currently I can only iterate to the chunks using this code:
for i in sent_list:
tagged = nltk.pos_tag(i)
ChunkGram = r"""Chunk: {<VB.?>+<JJ.?>*<NN.?>}"""
ChunkParser = nltk.RegexpParser(ChunkGram)
chunked = ChunkParser.parse(tagged)
for subtree in chunked.subtrees(filter=lambda t: t.label() == 'Chunk'):
print(subtree)
lets say my results are as such:
(Chunk routing/VBG rework/NN build/NN)
(Chunk build/VBP instruction/NN schedule/NN lot/NN)
(Chunk based/VBN firm/NN plan/NN)
Expected Results:
'routing','rework','build'
OR
'routing rework build'
would it be possible to do so ? or else please advice me on what i can do to extract these phrases.
I have found this code which helped me to achieve the results i want.
for subtree in chunked.subtrees(filter=lambda t: t.label() == 'Verb'):
verblist.append(" ".join([a for (a,b) in subtree.leaves()]))