I have long texts from which I need to extract nouns. I use spaCy
as
nlp = spacy.load("en_core_web_lg") # for better name entity detection
doc = nlp(text)
for token in doc:
if token.tag_=='NN' or token.tag_=='NNP':
# store token.lemma_
for ent in doc.ents:
# store ent.text
However, it is very slow, as spaCy
does the full analysis, which I do not need.
can I speed up spaCy
to do this specific job?
You can speed spaCy up by disabling the pretrained pipes that you don't need:
with nlp.disable_pipes("tagger", "parser"):
# your code
(note that if you still want to access token.tag
, you can't disable the tagger
)
Or you could even avoid loading these components altogether:
nlp = spacy.load("en_core_web_lg", disable=["tagger", "parser"])
Even only disabling the parser
should definitely give you a speed boost.
For more information, see here.