Search code examples
pythonnlptokenspacynamed-entity-recognition

Extract Only Certain Named Entities From Tokens


Quick question (hopefully). Is it possible for me to get the named entities of the tokens except for the ones with CARDINAL label (The label is 397). Here is my code below:

spacy_model = spacy.load('en-core-web-lg')
f = open('temp.txt')
tokens = spacy_model(f.read())
named_entities = tokens.ents #Except where named_entities.label = 397

Is this possible? Any help would be greatly appreciated.


Solution

  • You can filter out the entities using list comprehension:

    named_entities = [t for t in tokens.ents if t.label_ != 'CARDINAL']
    

    Here is a test:

    import spacy
    nlp = spacy.load("en_core_web_sm")
    tokens = nlp('The basket costs $10. I bought 6.')
    print([(ent.text, ent.label_) for ent in tokens.ents])
    # => [('10', 'MONEY'), ('6', 'CARDINAL')]
    print([t for t in tokens.ents if t.label_ != 'CARDINAL'])
    # => [10]