Search code examples
python-3.xnlpspacynamed-entity-recognition

Removing names from noun chunks in spacy


Is there a way to remove name of person in noun chunks ?

Here is the code

import en_vectors_web_lg
nlp = en_vectors_web_lg.load()
text = "John Smith is lookin for Apple ipod"
doc = nlp(text)
for chunk in doc.noun_chunks:
     print(chunk.text)

Current output

John Smith
Apple ipod

I would like to have an output like below where name of the people is ignored. How to achieve this ?

Apple ipod

Solution

  • Reference spaCy ents

    import spacy
    # loading the model
    nlp = spacy.load('en_core_web_lg')
    doc = nlp(u'"John Smith is lookin for Apple ipod"')
    # creating the filter list for tokens that are identified as person
    fil = [i for i in doc.ents if i.label_.lower() in ["person"]]
    # looping through noun chunks
    for chunk in doc.noun_chunks:
        # filtering the name of the person
        if chunk not in fil:
            print(chunk.text)
    

    Output:

    Apple ipod
    

    Hope this helps.