Search code examples
javaopennlpcustom-model-binder

Custom Model training opennlp


Hi already have referred to this, this, this and this but still finding it difficult to build a custom name finder model.. Here is the code:

public class CustomClassifierTrainer {

    private static final TokenNameFinderFactory TokenNameFinderFactory = null;
    static String onlpModelPath = "/Users/user/eclipse-workspace/openNLP/OpenNLP_models/en-ner-asiannames.bin";
    // training data set
    static String trainingDataFilePath = "/Users/user/eclipse-workspace/openNLP/trainingData/asiannames.txt";

    public static void main(String[] args) throws IOException {

        Charset charset = Charset.forName("UTF-8");

        ObjectStream<String> lineStream =
                new PlainTextByLineStream(new FileInputStream(trainingDataFilePath), charset);

        ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream);

        TokenNameFinderModel model;

        try {
          model = NameFinderME.train("en", "asian.person", sampleStream, TrainingParameters.defaultParams(),
                  TokenNameFinderFactory nameFinderFactory);
        }
        finally {
          sampleStream.close();
        }

        BufferedOutputStream modelOut = null;
        try {
          modelOut = new BufferedOutputStream(new FileOutputStream(onlpModelPath));
          model.serialize(modelOut);
        } finally {
          if (modelOut != null) 
             modelOut.close();      
        }



    }

}

I keep getting an error when trying to execute line:

ObjectStream<String> lineStream = new PlainTextByLineStream(new FileInputStream(trainingDataFilePath), charset);

asking me to cast the argument 1. when I change it to

ObjectStream<String> lineStream = new PlainTextByLineStream((InputStreamFactory) new FileInputStream(trainingDataFilePath), charset);

then I get a runtime error saying you cant cast this. Here is the error when I cast it Exception in thread "main" java.lang.ClassCastException: class java.io.FileInputStream cannot be cast to class opennlp.tools.util.InputStreamFactory (java.io.FileInputStream is in module java.base of loader 'bootstrap'; opennlp.tools.util.InputStreamFactory is in unnamed module of loader 'app') at openNLP.CustomClassifierTrainer.main(CustomClassifierTrainer.java:35)

The second issue is at line:

try {
  model = NameFinderME.train("en", "asian.person", sampleStream, TrainingParameters.defaultParams(),
              TokenNameFinderFactory nameFinderFactory);
}

giving a syntax error. Not sure whats wrong here. Any help would be appreciated as I have tried all the code snippets on the above-mentioned links.

Regards,


Solution

  • First error: your method expects an InputStreamFactory. You're trying to pass an InputStream. An InputStream is not an InputStreamFactory. Just like a Pizza is not a Car.

    If someone (the compiler) asks you for a Car, and you give him a Pizza, he won't be able to drive. Pretending that a Pizza is a Car by telling him "trust me, this pizza is a car" (which is what casting does) won't solve the problem.

    So you need to actually pass an InputStreamFactory. Look at the javadoc of this interface, and you'll see that it has a single method createInputStream() which takes nothing as argument, and is supposed to create and return an InputStream.

    A valid value would thus be

    () -> new FileInputStream(trainingDataFilePath)
    

    i.e. a lambda which takes no input and create a new input stream, and can thus be inferred to an InputStreamFactory.

    The second error is even simpler: you're not supposed to specify the types of the arguments when calling a method. Only when defining a method. So

    NameFinderME.train("en", 
                       "asian.person", 
                       sampleStream, 
                       TrainingParameters.defaultParams(),
                       TokenNameFinderFactory nameFinderFactory);
    

    should be

    NameFinderME.train("en", 
                       "asian.person", 
                       sampleStream, 
                       TrainingParameters.defaultParams(),
                       nameFinderFactory);
    

    Practice with simpler stuff to learn the Java syntax. Learn to read error messages instead of ignoring them, and to read the javadoc of the classs you're using. This is critical.