Natural Language Processing for fast detection of nouns

I have long texts from which I need to extract nouns. I use spaCy as

nlp = spacy.load("en_core_web_lg") # for better name entity detection
doc = nlp(text)
for token in doc:
    if token.tag_=='NN' or token.tag_=='NNP':
        # store token.lemma_
for ent in doc.ents:
    # store ent.text

However, it is very slow, as spaCy does the full analysis, which I do not need.

can I speed up spaCy to do this specific job?

Solution

You can speed spaCy up by disabling the pretrained pipes that you don't need:

with nlp.disable_pipes("tagger", "parser"):
   # your code

(note that if you still want to access token.tag, you can't disable the tagger)

Or you could even avoid loading these components altogether:

nlp = spacy.load("en_core_web_lg", disable=["tagger", "parser"])

Even only disabling the parser should definitely give you a speed boost.

For more information, see here.

Get chatGPT to respond with a single direct answer
Extracting and Identifying locations with NLP + Spacy
spacy Can't find model 'en_core_web_sm' on windows 10 and Python 3.5.3 :: Anaconda custom (64-bit)
ScispaCy in google colab
Seq2Seq trainer.train() keeps giving indexing error
Alternative to device_map = "auto" in Huggingface Pretrained
Use Natural Language Processing to to Split Bad & Good Comments from an Employee Survey
How to automatically determine text quality?
Paraphrasing for Math Word Problems (Changing sentence structure without changing meaning)
Why is part-of-speech tag for Adjectives 'JJ'?
Python fuzzy search and replace
How are the weights of the Mistral models reinitialized in Huggingface?
AttributeError: 'tuple' object has no attribute 'rank' when calling model.fit() in NLP task
Which Deep Learning Algorithm does Spacy uses when we train Custom model?
No such file or directory 'nltk_data/corpora/stopwords/English' when using colab
Break after first PER sequence found with Spacy
where can i download a pretrained word2vec map?
How can I use structured_output with Azure OpenAI with the openai Python library?
Fine-tuning a Pretrained Model with Quantization and AMP: Scaler Error "Attempting to Unscale FP16 Gradients"
How to extract subtitles from Youtube videos in varied languages
ImportError: cannot import name 'deprecated' from 'typing_extensions'
llama-cpp-python not using NVIDIA GPU CUDA
Keep training pytorch model on new data
Capitalized words in sentiment analysis
What is "language modeling head" in BertForMaskedLM
How to Process Data on GPU Instead of RAM for This Python Code?
cannot import name 'split_torch_state_dict_into_shards' from 'huggingface_hub'
How to Visualize Cross-Attention Matrices in MarianMTModel During Output Generation
implement a search engine chain using tavily in langchain
Why doesn't permuting positional encodings in BERT affect the output as expected?