Search code examples
stanford-nlpuimarutacleartk

cleartk dependency not found when calling StanfordCoreNLPAnnotator from UIMA RUTA


I am trying to call ClearTK's StanfordCoreNLPAnnotator from within UIMA RUTA, but cannot get it to work. I am using eclipse with a maven-enabled RUTA project in which I also have Java code for auxiliary tasks. I have imported cleartk-stanford-corenlp 0.8 using maven.

I tried using this line in my script:

ENGINE utils.MyStanfordEngine;

... where utils/MyStanfordEngine.xml is an XML descriptor file created using this java code:

MyStanfordAnnotator.getDescription().toXML(new FileOutputStream("descriptor/utils/MyStanfordEngine.xml"));

No errors appear, but upon execution I get:

Exception in thread "main" org.apache.uima.resource.ResourceInitializationException: Initialization of annotator class ... failed.  
(Descriptor: file:.../descriptor/mainScriptEngine.xml)
...
Caused by: org.apache.uima.resource.ResourceInitializationException: Annotator class 
"org.cleartk.stanford.StanfordCoreNLPAnnotator" was not found. 
(Descriptor: file:.../descriptor/utils/MyStanfordEngine.xml)
...

I think I understand that the RUTA project does not find it in the Maven dependencies, but I need to stick to Maven as my dependency tool because of collaboration purposes.

Can someone help?


UPDATE:

When I encountered the problem, I was using RUTA 2.1.0. I have updated to 2.2.0rc1 since then, but the problem persisted.

With Peter's suggestion below (Thanks!), in the Java build path, I referenced a blank Maven-enabled Java project that does nothing but imports cleartk-stanford-corenlp 0.8. I can now run the following RUTA code:

TYPESYSTEM utils.CleartkRutaTypeSystem;
ENGINE utils.MyStanfordEngine;
Document{-> CALL(MyStanfordEngine)};

... successfully does what looks like all intended annotations for all documents in the input folder, but eventually crashes with this Exception:

[Stanford Tools Logging output ...]
22.02.2014 12:44:22 org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl        callAnalysisComponentProcess(406)
SCHWERWIEGEND: Exception occurred
org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing failed.    
at org.apache.uima.ruta.engine.RutaEngine.process(RutaEngine.java:477)
at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:374)
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:298)
at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
at org.apache.uima.ruta.ide.launching.RutaLauncher.processFile(RutaLauncher.java:168)
at org.apache.uima.ruta.ide.launching.RutaLauncher.main(RutaLauncher.java:129)
Caused by: java.lang.NullPointerException
at org.apache.uima.cas.impl.CASImpl.createFS(CASImpl.java:483)
at org.apache.uima.cas.impl.CASImpl.createAnnotation(CASImpl.java:3837)
at org.apache.uima.ruta.action.CallAction.callEngine(CallAction.java:192)
at org.apache.uima.ruta.action.CallAction.execute(CallAction.java:62)
at org.apache.uima.ruta.rule.AbstractRuleElement.apply(AbstractRuleElement.java:130)
at org.apache.uima.ruta.rule.RuleElementCaretaker.applyRuleElements(RuleElementCaretaker.java:111)
at org.apache.uima.ruta.rule.ComposedRuleElement.applyRuleElements(ComposedRuleElement.java:547)
at org.apache.uima.ruta.rule.AbstractRuleElement.doneMatching(AbstractRuleElement.java:84)
at org.apache.uima.ruta.rule.ComposedRuleElement.fallback(ComposedRuleElement.java:468)
at org.apache.uima.ruta.rule.ComposedRuleElement.fallbackContinue(ComposedRuleElement.java:377)
at org.apache.uima.ruta.rule.RutaRuleElement.startMatch(RutaRuleElement.java:100)
at org.apache.uima.ruta.rule.ComposedRuleElement.startMatch(ComposedRuleElement.java:73)
at org.apache.uima.ruta.rule.RutaRule.apply(RutaRule.java:47)
at org.apache.uima.ruta.rule.RutaRule.apply(RutaRule.java:40)
at org.apache.uima.ruta.rule.RutaRule.apply(RutaRule.java:29)
at org.apache.uima.ruta.RutaScriptBlock.apply(RutaScriptBlock.java:63)
at org.apache.uima.ruta.RutaModule.apply(RutaModule.java:48)
at org.apache.uima.ruta.engine.RutaEngine.process(RutaEngine.java:475)
... 6 more
Exception in thread "main" org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing failed.    
at org.apache.uima.ruta.engine.RutaEngine.process(RutaEngine.java:477)
at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:374)
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:298)
at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
at org.apache.uima.ruta.ide.launching.RutaLauncher.processFile(RutaLauncher.java:168)
at org.apache.uima.ruta.ide.launching.RutaLauncher.main(RutaLauncher.java:129)
Caused by: java.lang.NullPointerException
at org.apache.uima.cas.impl.CASImpl.createFS(CASImpl.java:483)
at org.apache.uima.cas.impl.CASImpl.createAnnotation(CASImpl.java:3837)
at org.apache.uima.ruta.action.CallAction.callEngine(CallAction.java:192)
at org.apache.uima.ruta.action.CallAction.execute(CallAction.java:62)
at org.apache.uima.ruta.rule.AbstractRuleElement.apply(AbstractRuleElement.java:130)
at org.apache.uima.ruta.rule.RuleElementCaretaker.applyRuleElements(RuleElementCaretaker.java:111)
at org.apache.uima.ruta.rule.ComposedRuleElement.applyRuleElements(ComposedRuleElement.java:547)
at org.apache.uima.ruta.rule.AbstractRuleElement.doneMatching(AbstractRuleElement.java:84)
at org.apache.uima.ruta.rule.ComposedRuleElement.fallback(ComposedRuleElement.java:468)
at org.apache.uima.ruta.rule.ComposedRuleElement.fallbackContinue(ComposedRuleElement.java:377)
at org.apache.uima.ruta.rule.RutaRuleElement.startMatch(RutaRuleElement.java:100)
at org.apache.uima.ruta.rule.ComposedRuleElement.startMatch(ComposedRuleElement.java:73)
at org.apache.uima.ruta.rule.RutaRule.apply(RutaRule.java:47)
at org.apache.uima.ruta.rule.RutaRule.apply(RutaRule.java:40)
at org.apache.uima.ruta.rule.RutaRule.apply(RutaRule.java:29)
at org.apache.uima.ruta.RutaScriptBlock.apply(RutaScriptBlock.java:63)
at org.apache.uima.ruta.RutaModule.apply(RutaModule.java:48)
at org.apache.uima.ruta.engine.RutaEngine.process(RutaEngine.java:475)
... 6 more

Sorry for the whole stack trace, but I thought if a RUTA developer is reading this they may want the whole thing.

Is there a way to solve this? What am I doing wrong?


Solution

  • There are several limitations to consider:

    • UIMA Ruta 2.1.0 does not support mixin projects: maven dependencies need to be specified in another project. The Ruta project then has to depend on the additional java project.
    • UIMA Ruta Workbench 2.1.0 has some problems validating imported type system that import again other type systems by name. Here, rather import by location should be used.
    • UIMA CAS Editor 2.5.0 has some problems resolving type system imports using the datapath, which causes problems visualizing the created annotations if the type system descriptor needs additional information such as the datapath. Here, the creation of a type system descriptor of a script should include (not only import) all types of imported type systems. This can be configured in the preferences (I have not used that for a while). This problem can again be prevented by using import by location.
    • UIMA Ruta 2.2.0 supports mixin projects. Here, only the problem with the CAS Editor remains.

    This described project can be created the following way (with UIMA Ruta 2.2.0):

    1. Create a new UIMA Ruta Project
    2. Make it a maven project: popup->Configure->Convert to Maven Project
    3. Add a dependency to cleartk-stanford-corenlp in the pom

      <dependency>
      <groupId>org.cleartk</groupId>
      <artifactId>cleartk-stanford-corenlp</artifactId>
      <version>0.8.0</version>
      </dependency>
      
    4. Provide the type systems in the descriptor folder or in a dependent project, e.g., copy the org folder of cleartk-type-system-1.2.0 to the descriptor folder. Mind that the CAS Editor will have problems resolving the imports, if the descriptors are not adapted.
    5. Create a simple script that imports the type system, imports the analysis engine and excutes the analysis engine. Here, the uimaFIT component is directly imported instead of a descriptor. The EXEC action need to be extended with interesting types if later rules should be able to operate on the result of the imported analysis engine.

      TYPESYSTEM org.cleartk.TypeSystem;
      UIMAFIT org.cleartk.stanford.StanfordCoreNLPAnnotator;
      Document{->EXEC(StanfordCoreNLPAnnotator)};
      
    6. If there is a text file in the import folder, then running this script should be able to annotate it.

    This example directly uses the StanfordCoreNLPAnnotator instead of an additional analysis engine, but switching to another implementation or analysis engine should be straightforward.