Search code examples
nlpopennlpdkpro-corewebanno

How to convert WebAnno Name Entity annotation to use in OpenNLP?


Based in this issue I need to export in XMI format and use DKPro Core to convert to Brat format:

https://github.com/webanno/webanno/issues/328

I tried this code but did not have success

public void convert() throws Exception {

    SimplePipeline.runPipeline(CollectionReaderFactory
            .createReaderDescription(XmiReader.class, XmiReader.PARAM_SOURCE_LOCATION, "/tmp", XmiReader.PARAM_PATTERNS,
                    XmiReader.INCLUDE_PREFIX + "*.xmi"), AnalysisEngineFactory
              .createEngineDescription(BratWriter.class, BratWriter.PARAM_TARGET_LOCATION, "/tmp"));

    }

Solution

  • The dialect of the brat format may be different between what the DKPro Core BratWriter produces and what OpenNLP expects - the brat file format is quite flexible.

    If you are using the built-in Named Entity layer in WebAnno, then I would propose an alternative route:

    • Stay with the XMI export
    • Load the XMI with DKPro Core 1.9.0-SNAPSHOT and feed it to the OpenNlpNamedEntityRecognizerTrainer component

    That should avoid the need for the additional conversion step.

    Disclosure: I am a WebAnno and DKPro Core developer.

    Suggestions that didn't work: