Search code examples
pythonarrayslistpos-tagger

How to retrieve words with noun tags only from a file?


I need to retrieve only those words from a file whose pos tags are:'NN'or 'NNP' or 'NNS' or 'NNPS'. My sample input is:

  [['For,IN', ',,,', 'We,PRP', 'the,DT', 'divine,NN', 'caused,VBD', 'apostle,NN', 'We,PRP', 'vouchsafed,VBD', 'unto,JJ', 'Jesus,NNP', 'the,DT', 'son,NN', 'of,IN', 'Mary,NNP', 'all,DT', 'evidence,NN', 'of,IN', 'the,DT', 'truth,NN', ',,,', 'and,CC', 'strengthened,VBD', 'him,PRP', 'with,IN', 'holy,JJ'], [ 'be,VB', 'nor,CC', 'ransom,NN', 'taken,VBN', 'from,IN', 'them,PRP', 'and,CC', 'none,NN', '\n']]

My expected output is:

 [ 'divine', 'apostle','Jesus', 'son','Mary',  'evidence',  'truth',  'ransom', 'none']

Solution

  • Since your input is a list of a list, You could use a nested list comprehension:

    a_list = [['For,IN', ',,,', 'indeed,RB', ',,,', 'We,PRP', 'vouchsafed,VBD', 'unto,JJ', 'Moses,NNPS', 'the,DT', 'divine,NN', 'writ,NN', 'and,CC', 'caused,VBD', 'apostle,NN', 'after,IN', 'apostle,NN', 'to,TO', 'follow,VB', 'him,PRP', ';,:', 'and,CC', 'We,PRP', 'vouchsafed,VBD', 'unto,JJ', 'Jesus,NNP', ',,,', 'the,DT', 'son,NN', 'of,IN', 'Mary,NNP', ',,,', 'all,DT', 'evidence,NN', 'of,IN', 'the,DT', 'truth,NN', ',,,', 'and,CC', 'strengthened,VBD', 'him,PRP', 'with,IN', 'holy,JJ']]
    
    pos_tags = (',NN', ',NNP', ',NNS', ',NNPS')
    
    nouns = [s.split(',')[0] for sub in a_list for s in sub if s.endswith(pos_tags)]
    
    print(nouns)
    
    ['Moses', 'divine', 'writ', 'apostle', 'apostle', 'Jesus', 'son', 'Mary', 'evidence', 'truth']
    >>> 
    

    Edit:

    a_list = [['For,IN', ',,,', 'We,PRP', 'the,DT', 'divine,NN', 'caused,VBD', 'apostle,NN', 'We,PRP', 'vouchsafed,VBD', 'unto,JJ', 'Jesus,NNP', 'the,DT', 'son,NN', 'of,IN', 'Mary,NNP', 'all,DT', 'evidence,NN', 'of,IN', 'the,DT', 'truth,NN', ',,,', 'and,CC', 'strengthened,VBD', 'him,PRP', 'with,IN', 'holy,JJ'], ['be,VB', 'nor,CC', 'ransom,NN', 'taken,VBN', 'from,IN', 'them,PRP', 'and,CC', 'none,NN', '\n']]
    pos_tags = (',NN', ',NNP', ',NNS', ',NNPS')
    
    nouns = [s.split(',')[0] for sub in a_list for s in sub if s.endswith(pos_tags)]
    
    print(nouns)
    
    ['divine', 'apostle', 'Jesus', 'son', 'Mary', 'evidence', 'truth', 'ransom', 'none']
    >>>