I am using spaCy to try and pick up ORG entity tags in sentences. However, it is not picking up all of the tags, and the ones that it does pick up vary depending on how the organization name is written. For example:
import spacy
from spacy import displacy
doc = nlp("Apple, Microsoft, Google, and Facebook are all techo companies from the USA")
displacy.render(doc, style='ent') # I am using `.render` as I am in a notebook
generates:
This is clearly missing out
Facebook
while
import spacy
from spacy import displacy
doc = nlp("Apple, Microsoft Inc, Google, and Facebook are all techo companies from the USA")
displacy.render(doc, style='ent')
generates:
now missing both
Google
and Facebook
.
Any ideas as to what I am doing wrong?
You aren't doing anything wrong, the models just aren't perfect. See this issue on Github, which explains that this is just part of how statistical models work.
Note that your examples seem to work as expected with the latest large English model for me.