Search code examples
pythonregexnltkchunking

How print only the string result of the chunking with NLTK?


I'm using NLTK and RegEx to analyze my text. The model correctly identifies the chunk that I defined but in the end, all tagged words and "My_Chunk" show up in the print results. The question is how can I print only the chunked part of the text ("My_Chunk")?

Here are my code example:

import re
import nltk

text = ['The absolutely kind professor asked students out whom he met in class']

for item in text:
    tokenized = nltk.word_tokenize(item)
    tagged = nltk.pos_tag(tokenized)

    chunk = r"""My_Chunk: {<RB.?>*<NN.?>*<VBD.?>}"""
    chunkParser = nltk.RegexpParser(chunk)

    chunked = chunkParser.parse(tagged)
    print(chunked)
    chunked.draw()

And the print result is :

(S
  The/DT
  (My_Chunk absolutely/RB kind/NN professor/NN asked/VBD)
  students/NNS
  out/RP
  whom/WP
  he/PRP
  (Chunk met/VBD)
  in/IN
  class/NN)

Solution

  • This should do it:

    for a in chunked:
        if isinstance(a, nltk.tree.Tree):
            if a.label() == "My_Chunk":
                print(a)
                print(" ".join([lf[0] for lf in a.leaves()]))
                print()
    
    #(My_Chunk absolutely/RB kind/NN professor/NN asked/VBD)
    #absolutely kind professor asked
    
    #(My_Chunk met/VBD)
    #met