Search code examples
information-retrievaluimaducc

How to define multiple CAS Consumers in UIMA DUCC?


I am designing a text mining pipeline in UIMA DUCC as follows:

|-----------------|
|                 | ==CAS_1==> Pipeline A ==> Consumer A 
| CAS Multiplier  | ==CAS_2==> Pipeline B ==> Consumer B
|                 | ==CAS_3==> Pipeline C ==> Consumer C 
|-----------------|

I intend to run Piepline A, B and C in parallel. I believe it can be done using flow controller. Is my unsderstanding right ? If yes, how do I define multiple CCs. The process_descriptor_CC field in the job description file takes only one consumer. How can we pass multiple consumers and its piepline assosciation ?


Solution

  • If the intention is to process a large collection of documents with high throughput then the three pipelines, each including its CAS consumer, would all be in the AE (process_descriptor_AE) and the AE would include a custom flow controller to route CASes as desired. CASes in an AE would run one at a time, but multiple CM+AE threads could be run in parallel by specifying the number of JP threads (process_thread_count) to be greater than 1.