Search code examples
pythonnlpspacyinformation-extractionnamed-entity-recognition

spaCy coreference resolution - named entity recognition (NER) to return unique entity ID's?


Perhaps I've skipped over a part of the docs, but what I am trying to determine is a unique ID for each entity in the standard NER toolset. For example:

import spacy
from spacy import displacy
import en_core_web_sm
nlp = en_core_web_sm.load()

text = "This is a text about Apple Inc based in San Fransisco. "\
        "And here is some text about Samsung Corp. "\
        "Now, here is some more text about Apple and its products for customers in Norway"

doc = nlp(text)

for ent in doc.ents:
    print('ID:{}\t{}\t"{}"\t'.format(ent.label,ent.label_,ent.text,))


displacy.render(doc, jupyter=True, style='ent')

returns:

ID:381    ORG "Apple Inc" 
ID:382    GPE "San Fransisco" 
ID:381    ORG "Samsung Corp." 
ID:381    ORG "Apple" 
ID:382    GPE "Norway"

I have been looking at ent.ent_id and ent.ent_id_ but these are inactive according to the docs. I couldn't find anything in ent.root either.

For example, in GCP NLP each entity is returned with an ⟨entity⟩number that enables you to identify multiple instances of the same entity within a text.

This is a ⟨text⟩2 about ⟨Apple Inc⟩1 based in ⟨San Fransisco⟩4. And here is some ⟨text⟩3 about ⟨Samsung Corp⟩6. Now, here is some more ⟨text⟩8 about ⟨Apple⟩1 and its ⟨products⟩5 for ⟨customers⟩7 in ⟨Norway⟩9"

Does spaCy support something similar? Or is there a way using NLTK or Stanford?


Solution

  • You can use neuralcoref library to get coreference resolution working with SpaCy's models as:

    # Load your usual SpaCy model (one of SpaCy English models)
    import spacy
    nlp = spacy.load('en')
    
    # Add neural coref to SpaCy's pipe
    import neuralcoref
    neuralcoref.add_to_pipe(nlp)
    
    # You're done. You can now use NeuralCoref as you usually manipulate a SpaCy document annotations.
    doc = nlp(u'My sister has a dog. She loves him.')
    
    doc._.has_coref
    doc._.coref_clusters
    

    Find the installation and usage instructions here: https://github.com/huggingface/neuralcoref