Search code examples
pythonnlpnltkpos-tagger

NLTK: Getting rid of parentheses and pos- tagger


I have this code.

from nltk import pos_tag, ne_chunk
import nltk.chunk
from nltk.corpus import names
qry = "who is Ronald Avon"
tokens = nltk.tokenize.word_tokenize(qry)
pos = nltk.pos_tag(tokens)
sentt = nltk.ne_chunk(pos, binary = False)
person = []
for subtree in sentt.subtrees(filter=lambda t: t.node == 'PERSON'):
    for leave in subtree.leaves():
        person.append(leave)
print "person=", person

It gets names in a sentence. This is the result I get.

person= [('Ronald', 'NNP'), ('Avon', 'NNP')]

How do i get the result like this:

Ronald
Avon

without the 'NNP' and the parentheses. Thanks.


Solution

  • Use a list comprehension.

    To get an array of the names:

    names = [name for name, tag in person]
    

    To output a string in the format you give:

    # Python 2 (print is a statement)
    print "\n".join([name for name, tag in person])
    
    # Python 3 (print is a function)
    print("\n".join([name for name, tag in person]))
    

    This is really a basic Python data structure question - it's not specific to NLTK. You might find an introductory guide like An informal guide to Python useful.