nlp nltk named-entity-recognition reinforcement-learning

NLTK NER: Continuous Learning

I have been trying to use NER feature of NLTK. I want to extract such entities from the articles. I know that it can not be perfect in doing so but I wonder if there is human intervention in between to manually tag NEs, will it improve?

If yes, is it possible with present model in NLTK to continually train the model. (Semi-Supervised Training)

Solution

The plain vanilla NER chunker provided in nltk internally uses maximum entropy chunker trained on the ACE corpus. Hence it is not possible to identify dates or time, unless you train it with your own classifier and data(which is quite a meticulous job).

You could refer this link for performing he same.

Also, there is a module called timex in nltk_contrib which might help you with your needs.

If you are interested to perform the same in Java better look into Stanford SUTime, it is a part of Stanford CoreNLP.

InvalidArgumentError: indices[120,0] = 3080 is not in [0, 32) [[{{node embedding_6/embedding_lookup}}]]
Get chatGPT to respond with a single direct answer
Extracting and Identifying locations with NLP + Spacy
spacy Can't find model 'en_core_web_sm' on windows 10 and Python 3.5.3 :: Anaconda custom (64-bit)
ScispaCy in google colab
Seq2Seq trainer.train() keeps giving indexing error
Alternative to device_map = "auto" in Huggingface Pretrained
Use Natural Language Processing to to Split Bad & Good Comments from an Employee Survey
How to automatically determine text quality?
Paraphrasing for Math Word Problems (Changing sentence structure without changing meaning)
Why is part-of-speech tag for Adjectives 'JJ'?
Python fuzzy search and replace
How are the weights of the Mistral models reinitialized in Huggingface?
AttributeError: 'tuple' object has no attribute 'rank' when calling model.fit() in NLP task
Which Deep Learning Algorithm does Spacy uses when we train Custom model?
No such file or directory 'nltk_data/corpora/stopwords/English' when using colab
Break after first PER sequence found with Spacy
where can i download a pretrained word2vec map?
How can I use structured_output with Azure OpenAI with the openai Python library?
Fine-tuning a Pretrained Model with Quantization and AMP: Scaler Error "Attempting to Unscale FP16 Gradients"
How to extract subtitles from Youtube videos in varied languages
ImportError: cannot import name 'deprecated' from 'typing_extensions'
llama-cpp-python not using NVIDIA GPU CUDA
Keep training pytorch model on new data
Capitalized words in sentiment analysis
What is "language modeling head" in BertForMaskedLM
How to Process Data on GPU Instead of RAM for This Python Code?
cannot import name 'split_torch_state_dict_into_shards' from 'huggingface_hub'
How to Visualize Cross-Attention Matrices in MarianMTModel During Output Generation
implement a search engine chain using tavily in langchain