I am trying to structure my a data processing pipeline using uimaFit as follows:
[annotatorA]
=> [Consumer to dump annotatorA's annotations from CAS into DB]
[annotatorB (should take on annotatorA's annotations from DB as input)]
=>[Consumer for annotatorB]
The driver code:
/* Step 0: Create a reader */
CollectionReader readerInstance= CollectionReaderFactory.createCollectionReader(
FilePathReader.class, typeSystem,
FilePathReader.PARAM_INPUT_FILE,"/path/to/file/to/be/processed");
/*Step1: Define Annotoator A*/
AnalysisEngineDescription annotatorAInstance=
AnalysisEngineFactory.createPrimitiveDescription(
annotatorADbConsumer.class, typeSystem,
annotatorADbConsumer.PARAM_DB_URL,"localhost",
annotatorADbConsumer.PARAM_DB_NAME,"xyz",
annotatorADbConsumer.PARAM_DB_USER_NAME,"name",
annotatorADbConsumer.PARAM_DB_USER_PWD,"pw");
builder.add(annotatorAInstance);
/* Step2: Define binding for annotatorB to take
what-annotator-a put in DB above as input */
/*Step 3: Define annotator B */
AnalysisEngineDescription annotatorBInstance =
AnalysisEngineFactory.createPrimitiveDescription(
GateDateTimeLengthAnnotator.class,typeSystem)
builder.add(annotatorBInstance);
/*Step 4: Run the pipeline*/
SimplePipeline.runPipeline(readerInstance, builder.createAggregate());
Questions I have are:
Is the approach suggested at https://code.google.com/p/uimafit/wiki/ExternalResources#Resource_injection , the right direction to achieve it ?
You can define the dependency with @TypeCapability
like this:
@TypeCapability(inputs = { "com.myproject.types.MyType", ... }, outputs = { ... })
public class MyAnnotator extends JCasAnnotator_ImplBase {
....
}
Note that it defines a contract at the annotation level, not the engine level (meaning that any Engine could create com.myproject.types.MyType
).
I don't think there are ways to enforce it.
I did create some code to check that an Engine is provided with the right required Annotations in the upstream of a pipeline, and prints an error log otherwise (see Pipeline.checkAndAddCapabilities() and Pipeline.addCapabilities() ). Note however that it will only work if all Engines define their TypeCapabilities, which is often not the case when one uses external Engines/libraries.