Search code examples
nlpdata-mininginformation-retrievaluima

How to define a CAS in database as external resource for an annotator in uimaFIT?


I am trying to structure my a data processing pipeline using uimaFit as follows:

[annotatorA] => [Consumer to dump annotatorA's annotations from CAS into DB]

[annotatorB (should take on annotatorA's annotations from DB as input)]=>[Consumer for annotatorB]

The driver code:

   /* Step 0: Create a reader */
    CollectionReader readerInstance= CollectionReaderFactory.createCollectionReader(
            FilePathReader.class, typeSystem,
            FilePathReader.PARAM_INPUT_FILE,"/path/to/file/to/be/processed");

   /*Step1: Define Annotoator A*/
    AnalysisEngineDescription annotatorAInstance=
           AnalysisEngineFactory.createPrimitiveDescription(
                    annotatorADbConsumer.class, typeSystem, 
                    annotatorADbConsumer.PARAM_DB_URL,"localhost",
                    annotatorADbConsumer.PARAM_DB_NAME,"xyz",
                    annotatorADbConsumer.PARAM_DB_USER_NAME,"name",
                    annotatorADbConsumer.PARAM_DB_USER_PWD,"pw");
    builder.add(annotatorAInstance);

    /* Step2: Define binding for annotatorB to take 
         what-annotator-a put in DB above as input */

    /*Step 3: Define annotator B */
    AnalysisEngineDescription annotatorBInstance =
            AnalysisEngineFactory.createPrimitiveDescription(
                    GateDateTimeLengthAnnotator.class,typeSystem)
    builder.add(annotatorBInstance);

    /*Step 4: Run the pipeline*/
    SimplePipeline.runPipeline(readerInstance, builder.createAggregate());

Questions I have are:

  1. Is the above approach correct?
  2. How do we define the depencdency of annotatorA's output in annotatorB in step 2?

Is the approach suggested at https://code.google.com/p/uimafit/wiki/ExternalResources#Resource_injection , the right direction to achieve it ?


Solution

  • You can define the dependency with @TypeCapability like this:

    @TypeCapability(inputs = { "com.myproject.types.MyType", ... }, outputs = { ... })
    public class MyAnnotator extends JCasAnnotator_ImplBase {
        ....
    }
    

    Note that it defines a contract at the annotation level, not the engine level (meaning that any Engine could create com.myproject.types.MyType).

    I don't think there are ways to enforce it.

    I did create some code to check that an Engine is provided with the right required Annotations in the upstream of a pipeline, and prints an error log otherwise (see Pipeline.checkAndAddCapabilities() and Pipeline.addCapabilities() ). Note however that it will only work if all Engines define their TypeCapabilities, which is often not the case when one uses external Engines/libraries.