I am very new to ML and also Spacy in general. I am trying to show Named Entities from an input text.
This is my method:
def run():
nlp = spacy.load('en_core_web_sm')
sentence = "Hi my name is Oliver!"
doc = nlp(sentence)
#Threshold for the confidence socres.
threshold = 0.2
beams = nlp.entity.beam_parse(
[doc], beam_width=16, beam_density=0.0001)
entity_scores = defaultdict(float)
for beam in beams:
for score, ents in nlp.entity.moves.get_beam_parses(beam):
for start, end, label in ents:
entity_scores[(start, end, label)] += score
#Create a dict to store output.
ners = defaultdict(list)
ners['text'] = str(sentence)
for key in entity_scores:
start, end, label = key
score = entity_scores[key]
if (score > threshold):
ners['extractions'].append({
"label": str(label),
"text": str(doc[start:end]),
"confidence": round(score, 2)
})
pprint(ners)
The above method works fine, and will print something like:
'extractions': [{'confidence': 1.0,
'label': 'PERSON',
'text': 'Oliver'}],
'text': 'Hi my name is Oliver'})
So far so good. Now I am trying to get the actual position of the found named entity. In this case "Oliver".
Looking at the documentation, there is: ent.start_char, ent.end_char
available, but if I use that:
"start_position": doc.start_char,
"end_position": doc.end_char
I get the following error:
AttributeError: 'spacy.tokens.doc.Doc' object has no attribute 'start_char'
Can someone guide me in the right direction?
So I actually found an answer right after posting this question (typical).
I found that I didn't need to save the information into entity_scores
, but instead just iterate over the actual found entities ent
:
I ended up adding for ent in doc.ents:
instead and this gives me access to all the standard Spacy attributes. See below:
ners = defaultdict(list)
ners['text'] = str(sentence)
for beam in beams:
for score, ents in nlp.entity.moves.get_beam_parses(beam):
for ent in doc.ents:
if (score > threshold):
ners['extractions'].append({
"label": str(ent.label_),
"text": str(ent.text),
"confidence": round(score, 2),
"start_position": ent.start_char,
"end_position": ent.end_char
My entire method ends up looking like this:
def run():
nlp = spacy.load('en_core_web_sm')
sentence = "Hi my name is Oliver!"
doc = nlp(sentence)
threshold = 0.2
beams = nlp.entity.beam_parse(
[doc], beam_width=16, beam_density=0.0001)
ners = defaultdict(list)
ners['text'] = str(sentence)
for beam in beams:
for score, ents in nlp.entity.moves.get_beam_parses(beam):
for ent in doc.ents:
if (score > threshold):
ners['extractions'].append({
"label": str(ent.label_),
"text": str(ent.text),
"confidence": round(score, 2),
"start_position": ent.start_char,
"end_position": ent.end_char
})