Search code examples
javanlpopennlpnamed-entity-recognition

How to modify or retrain existing OpenNLP models?


Is there any way to retrain existing OpenNLP models?? i.e to append new items to the existing models from OpenNLP ?

Suppose I want to add few new entries to existing en-ner-date.bin because some of the words are not getting detected as date.

Note: I don't want to make new model. I just want to modify the existing one...

I have seen something like model builder-add on but there is no concrete example about how to use it.

Any help will be appreciated.


Solution

  • You can not simple manipulate existing binary OpenNLP model files. You have to train your own model(s) with the specific capabilities, that is, detecting named entities seen in text samples from (your) training. See hint on the OpenNLP model download page:

    The models can be used for testing or getting started. Please train your own models for all other use cases.

    Moreover, quoting the Apache OpenNLP developer Manual:

    The pre-trained models might not be available for a desired language, can not detect important entities or the performance is not good enough outside the news domain. These are the typical reason to do custom training of the name finder on a new corpus or on a corpus which is extended by private training data taken from the data which should be analyzed.

    Further details see section Name Finder Training.