Search code examples
pythonnlpspacynamed-entity-recognition

How to extract sentences from one text with only 1 named entity using spaCy?


I have a list of sentences and I want to be able to append only the sentences with 1 "PERSON" named entity using spaCy. The code I used was as follows:

test_list = []
for item in sentences: #for each sentence in 'sentences' list
  for ent in item.ents: #for each entity in the sentence's entities 
    if len(ent in item.ents) == 1: #if there is only one entity
      if ent.label_ == "PERSON": #and if the entity is a "PERSON"
        test_list.append(item) #put the sentence into 'test_list'

But then I get:

TypeError: object of type 'bool' has no len()

Am I doing this wrong? How exactly would I complete this task?


Solution

  • You get the error because ent in item.ents returns a boolean result, and you can't get its length.

    What you want is

    test_list = []
    for item in sentences: #for each sentence in 'sentences' list
        if len(item.ents) == 1 and item.ents[0].label_ == "PERSON": #if there is only one entity and if the entity is a "PERSON"
            test_list.append(item) #put the sentence into 'test_list'
    

    The len(item.ents) == 1 checks if there is only one entity detected in the sentence, and item.ents[0].label_ == "PERSON" makes sure the first entity lable text is PERSON.

    Note the and operator, both conditions must be met.