Search code examples
mavenuimaruta

How to create pipeline of java nlp and ruta scripts?


I'm working on a Maven project which dynamically executes some ruta scripts to annotate some tags and process the output in java.

Now that I want to use NLP (mostly dkpro) first and then pass the output to the ruta scripts (pipeline) and process further. How to achieve it ?


Edited:

Below is my new script;

    AnalysisEngineDescription pipeline = createEngineDescription(createEngineDescription(OpenNlpSegmenter.class),
            createEngineDescription(OpenNlpPosTagger.class),
            AnalysisEngineFactory.createEngineDescription(RutaEngine.class, RutaEngine.PARAM_MAIN_SCRIPT,
                    "com.textjuicer.ruta.date.Author_updated"),
            createEngineDescription(ConsoleWriter.class));

Error:

Not able to resolve type: Reference

May 25, 2016 6:45:43 PM org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl processAndOutputNewCASes(273) SEVERE: Exception occurred org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing failed.
at org.apache.uima.ruta.engine.RutaEngine.process(RutaEngine.java:563) at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:378) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:298) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:568) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.(ASB_impl.java:410) at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:343) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:568) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.(ASB_impl.java:410) at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:343) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265) at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267) at org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:170) at org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:191) at com.textjuicer.ruta.date.ArtifactAnnotator.runNLP(ArtifactAnnotator.java:225) at com.textjuicer.ruta.date.ArtifactAnnotator.getAllAnnotations(ArtifactAnnotator.java:70) at com.textjuicer.ruta.date.ArtifactAnnotator.main(ArtifactAnnotator.java:38) Caused by: java.lang.IllegalArgumentException: Not able to resolve type: Reference at org.apache.uima.ruta.expression.type.SimpleTypeExpression.getType(SimpleTypeExpression.java:48) at org.apache.uima.ruta.rule.RegExpRule.getGroup2Types(RegExpRule.java:148) at org.apache.uima.ruta.rule.RegExpRule.apply(RegExpRule.java:80) at org.apache.uima.ruta.RutaScriptBlock.apply(RutaScriptBlock.java:63) at org.apache.uima.ruta.RutaModule.apply(RutaModule.java:48) at org.apache.uima.ruta.engine.RutaEngine.process(RutaEngine.java:561) ... 17 more

Exception in thread "main" org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing failed.
at org.apache.uima.ruta.engine.RutaEngine.process(RutaEngine.java:563) at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:378) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:298) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:568) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.(ASB_impl.java:410) at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:343) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:568) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.(ASB_impl.java:410) at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:343) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265) at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267) at org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:170) at org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:191) at com.textjuicer.ruta.date.ArtifactAnnotator.runNLP(ArtifactAnnotator.java:225) at com.textjuicer.ruta.date.ArtifactAnnotator.getAllAnnotations(ArtifactAnnotator.java:70) at com.textjuicer.ruta.date.ArtifactAnnotator.main(ArtifactAnnotator.java:38) Caused by: java.lang.IllegalArgumentException: Not able to resolve type: Reference at org.apache.uima.ruta.expression.type.SimpleTypeExpression.getType(SimpleTypeExpression.java:48) at org.apache.uima.ruta.rule.RegExpRule.getGroup2Types(RegExpRule.java:148) at org.apache.uima.ruta.rule.RegExpRule.apply(RegExpRule.java:80) at org.apache.uima.ruta.RutaScriptBlock.apply(RutaScriptBlock.java:63) at org.apache.uima.ruta.RutaModule.apply(RutaModule.java:48) at org.apache.uima.ruta.engine.RutaEngine.process(RutaEngine.java:561) ... 17 more


Solution

  • You can add Ruta script simply as an analysis engine at the end of your DKPro Pipeline. The exact code mainly depends on how you build and run your pipeline.

    Adapted from the uimafit documentation:

    // your collecton reader
    CollectionReaderDescription reader = 
      CollectionReaderFactory.createReaderDescription(
        TextReader.class, 
        TextReader.PARAM_INPUT, "/home/uimafit/documents");
    
    // some DKPro Code component
    AnalysisEngineDescription dkpro= 
      AnalysisEngineFactory.createEngineDescription(
        Tokenizer.class);
    
    AnalysisEngineDescription ruta = 
      AnalysisEngineFactory.createEngineDescription(
        RutaEngine.class, 
        RutaEngine.PARAM_MAIN_SCRIPT, "Main.ruta");
    
    // some writer
    AnalysisEngineDescription writer= 
      AnalysisEngineFactory.createEngineDescription(
        XmiWriter.class, 
        XmiWriter.PARAM_OUTPUT, "/home/uimafit/output");
    
    SimplePipeline.runPipeline(reader, dkpro, ruta, writer);
    

    You can create an analysis engine of your Ruta script by using the uimaFIT factories by either specifying the mainScript parameter or by directly configuring the rules with PARAM_RULES. You can also use the xml descriptor of the Ruta script to create the analysis engine.

    If the ruta script declares new types, then either the xml descriptor has to be used to create the analysis engine, or the types.txt file of uimaFIT needs to be extended by the generated type system of the script. (... or the type system need to be included in some other way.)

    If the ruta script imports and calls other scripts, then the generated descriptor need to be used, or the corresponding parameters need to be set correctly, e.g., additionalScripts. Same is true for imported analysis engines.

    If you import the NLP/DKPro typesystem in your Ruta script, then you can simply write rules using the DKPro annotations.

    (I am a developer of UIMA Ruta)