Search code examples
nlpspacy

Natural Language Processing for fast detection of nouns


I have long texts from which I need to extract nouns. I use spaCy as

nlp = spacy.load("en_core_web_lg") # for better name entity detection
doc = nlp(text)
for token in doc:
    if token.tag_=='NN' or token.tag_=='NNP':
        # store token.lemma_
for ent in doc.ents:
    # store ent.text

However, it is very slow, as spaCy does the full analysis, which I do not need.

can I speed up spaCy to do this specific job?


Solution

  • You can speed spaCy up by disabling the pretrained pipes that you don't need:

    with nlp.disable_pipes("tagger", "parser"):
       # your code
    

    (note that if you still want to access token.tag, you can't disable the tagger)

    Or you could even avoid loading these components altogether:

    nlp = spacy.load("en_core_web_lg", disable=["tagger", "parser"])
    

    Even only disabling the parser should definitely give you a speed boost.

    For more information, see here.