Search code examples
pythonspacyspacy-3

spaCy not picking up all ORG tags in sentence


I am using spaCy to try and pick up ORG entity tags in sentences. However, it is not picking up all of the tags, and the ones that it does pick up vary depending on how the organization name is written. For example:

import spacy
from spacy import displacy

doc = nlp("Apple, Microsoft, Google, and Facebook are all techo companies from the USA")
displacy.render(doc, style='ent')    # I am using `.render` as I am in a notebook

generates: enter image description here This is clearly missing out Facebook

while

import spacy
from spacy import displacy

doc = nlp("Apple, Microsoft Inc, Google, and Facebook are all techo companies from the USA")
displacy.render(doc, style='ent')

generates: enter image description here now missing both Google and Facebook.

Any ideas as to what I am doing wrong?


Solution

  • You aren't doing anything wrong, the models just aren't perfect. See this issue on Github, which explains that this is just part of how statistical models work.

    Note that your examples seem to work as expected with the latest large English model for me.