Search code examples
pythonpython-3.xnlpspacy

How to extract noun and adjective pairs including conjunctions


Background

I would like to extract nouns and adjectives pairs using NLP libraries, such as spaCy.

The expected input and output are below.

The pink, beautiful, and small flowers are blown away.
{'flowers':['pink', 'beautiful', 'small']}

I got a red candy and an interesting book.
{'candy':['red'], 'book':['interesting']}

Problem

Following the answer to a similar question, How to extract noun adjective pairs from a sentence, I executed the program with my inputs.

However, it returned no output.

[]

Code

import spacy
nlp = spacy.load('en')
doc = nlp('The beautiful and small flowers are blown away.')
noun_adj_pairs = []
for i,token in enumerate(doc):
    if token.pos_ not in ('NOUN','PROPN'):
        continue
    for j in range(i+1,len(doc)):
        if doc[j].pos_ == 'ADJ':
            noun_adj_pairs.append((token,doc[j]))
            break
print(noun_adj_pairs)

Trial

I've tried to write a new code but still, I'm stack with how to handle adjectives with conjunctions.

input
I got a red candy and an interesting book.
output
{'candy': 'red', 'book': 'interesting'}

input
The pink, beautiful, and small flowers are blown away.
output
{'flowers': 'small'}

trial code

import spacy
nlp = spacy.load('en')
doc = nlp('I got a red candy and an interesting book.')
noun_adj_pairs = {}
for word in doc:
    if word.pos_ == 'ADJ' and word.dep_ != "cc":
        if word.head.pos_ =="NOUN":
            noun_adj_pairs[str(word.head.text)]=str(word.text)

print(noun_adj_pairs)

Environment

Python 3.6


Solution

  • You may wish to try noun_chunks:

    import spacy
    nlp = spacy.load('en_core_web_sm')
    doc = nlp('I got a red candy and an interesting and big book.')
    
    noun_adj_pairs = {}
    for chunk in doc.noun_chunks:
        adj = []
        noun = ""
        for tok in chunk:
            if tok.pos_ == "NOUN":
                noun = tok.text
            if tok.pos_ == "ADJ":
                adj.append(tok.text)
        if noun:
            noun_adj_pairs.update({noun:adj})
    
    # expected output
    noun_adj_pairs
    {'candy': ['red'], 'book': ['interesting', 'big']}
    

    Should you wish to include conjunctions:

    noun_adj_pairs = {}
    for chunk in doc.noun_chunks:
        adj = []
        noun = ""
        for tok in chunk:
            if tok.pos_ == "NOUN":
                noun = tok.text
            if tok.pos_ == "ADJ" or tok.pos_ == "CCONJ":
                adj.append(tok.text)
        if noun:
            noun_adj_pairs.update({noun:" ".join(adj)})
    
    noun_adj_pairs
    {'candy': 'red', 'book': 'interesting and big'}