So, I'm trying to use Trainning API of POSTagger. But I would like to append the new trained data to the old model. Or, if I want to train it multiple times, I would have a lot of model files. How could I combine of the result back to the existing model. So, I only have one model with bigger data. I think the model file is a binary file, so I'm not sure if appending file could work in this case.
Here is my code
public class POSTraining {
private final String outputModel;
private InputStream dataIn;
public POSTraining() throws IOException {
outputModel = this.getClass().getResource("/model/en-pos-maxent.bin").getPath();
dataIn = this.getClass().getResourceAsStream("/model/en-pos.train");
}
public static void main(String args[]) throws IOException {
POSTraining posTraining = new POSTraining();
posTraining.train();
}
public void train() {
try {
ObjectStream lineStream = new PlainTextByLineStream(dataIn, "UTF-8");
ObjectStream sampleStream = new WordTagSampleStream(lineStream);
TrainingParameters trainParams = new TrainingParameters();
trainParams.put("model", ModelType.MAXENT.name());
POSModel trainedModel = POSTaggerME.train("en", sampleStream, trainParams, null, null);
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(new FileOutputStream(outputModel));
trainedModel.serialize(bufferedOutputStream);
} catch (IOException e) {
e.printStackTrace();
} finally {
if (dataIn != null) {
try {
dataIn.close();
} catch (IOException e) {
// Not an issue, training already finished.
// The exception should be logged and investigated
// if part of a production system.
e.printStackTrace();
}
}
}
}
}
That's not generally possible with NLP models. You can't incrementally adjust them.