i have the following code for training Open NLP POS Tagger
Trainer(String trainingData, String modelSavePath, String dictionary){
try {
dataIn = new MarkableFileInputStreamFactory(
new File(trainingData));
lineStream = new PlainTextByLineStream(dataIn, "UTF-8");
ObjectStream<POSSample> sampleStream = new WordTagSampleStream(lineStream);
POSTaggerFactory fac=new POSTaggerFactory();
if(dictionary!=null && dictionary.length()>0)
{
fac.setDictionary(new Dictionary(new FileInputStream(dictionary)));
}
model = POSTaggerME.train("en", sampleStream, TrainingParameters.defaultParams(), fac);
} catch (IOException e) {
// Failed to read or parse training data, training failed
e.printStackTrace();
} finally {
if (lineStream != null) {
try {
lineStream.close();
} catch (IOException e) {
// Not an issue, training already finished.
// The exception should be logged and investigated
// if part of a production system.
e.printStackTrace();
}
}
}
}
and this works just fine. Now, is it possible to do the same without involving files? I want to store the training data in a database somewhere. Then i can read it as a stream or chunks and feed it to the trainer. I do not want to create a temp file. Is this possible?
Yes, instead of passing FileInputStream to a dictionary, you can create your own implementation of InputStream, say DatabaseSourceInputStream and use it instead.