Search code examples
javaopennlptraining-data

Create our own model for training openNLP and use it in java


I'm new to openNLP. I want to know how to build our own model to train to pick our specific data in java with openNLP. Highly appreciate all your answers.


Solution

  • There are several trainable components in OpenNLP. DocumentCategorizer NameFinder Tokenizer POSTagger Chunker Parser

    The ones I have particularly used the most are the NameFinder (for named entity extraction/recognition) and the documentCategorizer, which is used for text classification like sentiment analysis.

    The namefinder has a training format that this post might help understand traning OPenNLP error and this Writing our own models in openNLP

    the documentCategorizer has a differnt format but is quite simple. take a look at the docs here non the OpenNLP site http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.htm

    HTH

    just saw you comment, so updating. You want to train a namefinder for your use case. So you create a file of sentences, and each sentence you annotate the entity in the sentence as in the link I provided, then build the model. You'll want about 15000 sentences to get really good results.