Search code examples
javaspringspring-batchcamunda

Batch vs BPMN processing with long wait states


I am working on a Java project (Spring) where I need to create batch processing for a rather complex scenario. Some of the steps would have to do a rather long waiting, e.g. 20 days. I am not really sure, how I should tackle this scenario.

My first thought was to do it in a BPMN engine (like Camunda), where wait states are part of the modelling elements. Taking in consideration the amount of data that need to be processed, I would end up with around 250.000 process instances for a single run and this amount of data concerns from a performance perspective in case of a BPMN-based processing.

The other solution would be taking a framework designed for batch processing, e.g. Spring Batch. My problem in that case is the long wait states. As far as I understood it, Spring Batch doesn't support wait states. Each draft solution I came up with however, has some limitation.

Would I do it in one big batch, the step, where potentially the wait can occur, will halt until the 20 days are over. This would block processing the rest of the data to be processed. I guess I could mark the given data item to be processed later, and re-run the batch until every item is processed. This would mean I would need to lay out every branch of the processing in a linear fashion and the steps themselves would accept only data items marked for them.

Would I start a new batch job for every item to be processed, I would need a mechanism to stop at a given point and then later revive the job/step from a given step. A natural solution would be to create a more fine grained job system but then I would lose the context information about the processing.

In both cases, I would end up with my own BPMN solution implemented as a state machine in the database...

I would appreciate any help or hint you can give me. Thanks in advance.


Solution

  • I discussed this issue also with my technical lead and there were several arguments against BPMN (Camunda).

    1. level of difficulty of a potential process update in case there are instances still in wait state (e.g. running processes).
    2. using BPMN and potentially solving scaling and performance could be in contradiction for what we originally intended to use it (simple message orchestration).

    Because of these reasons the decision was to go with Spring Batch, where we divide the business batch (as described in the activity diagrams, with the wait states etc.) and the technical batch (the Spring Batch implementation). In that case, every step can take the current context, perform the operation and update it. One technical batch would potentially handle every business batch currently planned to be processed (e.g. data of the newly created batch and all those where the wait states expired today). This also require us to create a state machine; probably one which supports parallel execution as well (for example Petri net).