Search code examples
androidmachine-learningnlpopennlp

Reading POS tag models in Android


I have tried doing POS tagging using openNLP POS Models on a normal Java application. Now I would like to implement it on Android platform. I am not sure what is the Android requirement or restrictions as I am not able to read the models (binary file) and execute the POS tagging properly.

I tried getting the .bin file from external storage as well as putting it in an external libraries but still it couldn't work. These are my codes:

InputStream modelIn = null;
POSModel model = null;

String path = Environment.getExternalStorageDirectory().getPath() + "/TextSumIt/en-pos-maxent.bin";

modelIn = new BufferedInputStream( new FileInputStream(path));
model = new POSModel(modelIn);

The error I got:

11-15 06:39:35.072: W/System.err(565): opennlp.tools.util.InvalidFormatException: The profile data stream has an invalid format!
11-15 06:39:35.177: W/System.err(565):  at opennlp.tools.dictionary.serializer.DictionarySerializer.create(DictionarySerializer.java:224)
11-15 06:39:35.177: W/System.err(565):  at opennlp.tools.postag.POSDictionary.create(POSDictionary.java:282)
11-15 06:39:35.182: W/System.err(565):  at opennlp.tools.postag.POSModel$POSDictionarySerializer.create(POSModel.java:48)
11-15 06:39:35.182: W/System.err(565):  at opennlp.tools.postag.POSModel$POSDictionarySerializer.create(POSModel.java:44)
11-15 06:39:35.182: W/System.err(565):  at opennlp.tools.util.model.BaseModel.<init>(BaseModel.java:135)
11-15 06:39:35.197: W/System.err(565):  at opennlp.tools.postag.POSModel.<init>(POSModel.java:93)
11-15 06:39:35.197: W/System.err(565):  at com.main.textsumit.SummarizationActivity.postagWords(SummarizationActivity.java:676)
11-15 06:39:35.205: W/System.err(565):  at com.main.textsumit.SummarizationActivity.generateSummary(SummarizationActivity.java:252)
11-15 06:39:35.205: W/System.err(565):  at com.main.textsumit.SummarizationActivity.onCreate(SummarizationActivity.java:127)

What is it that cause it not reading the model properly? And how should I resolve this? Please help.

Thank you.


Solution

  • For what it's worth, if this is still an issue: I had a similar issue attempting to use the POS model in a different context (non-Android), and in my case it appeared to be the extraction failing from the bin file, not anything with the model itself. It appears to be local to the tags.tagdict file in the archive (as suggested here http://sharpnlp.codeplex.com/discussions/263620), so if you don't need that currently (and I didn't for my simple scenarios) then try removing it from the archive. (But leave the archive intact as it's expected to arrive in zip'd form.)