Search code examples
python-3.xnltktext-processingpos-taggerchunking

Remove Part of Speech Tags after chunking


How to remove part of speech tags from the results of chunking ? I am using NLTK to do this. Currently I can only iterate to the chunks using this code:

for i in sent_list:
tagged = nltk.pos_tag(i)

ChunkGram = r"""Chunk: {<VB.?>+<JJ.?>*<NN.?>}"""

ChunkParser = nltk.RegexpParser(ChunkGram)
chunked = ChunkParser.parse(tagged)
for subtree in chunked.subtrees(filter=lambda t: t.label() == 'Chunk'):
    print(subtree)

lets say my results are as such:

(Chunk routing/VBG rework/NN build/NN)
(Chunk build/VBP instruction/NN schedule/NN lot/NN)
(Chunk based/VBN firm/NN plan/NN)

Expected Results:

'routing','rework','build'

OR

'routing rework build'

would it be possible to do so ? or else please advice me on what i can do to extract these phrases.


Solution

  • I have found this code which helped me to achieve the results i want.

    for subtree in chunked.subtrees(filter=lambda t: t.label() == 'Verb'):
                verblist.append(" ".join([a for (a,b) in subtree.leaves()]))