Search code examples
jsr352java-batch

Batch job definition: How to run a dynamically-calculated number of partitions?


As a newbie to the Batch Processing API (JSR-352), I have some difficulties modeling the following (simplified) scenario:

  1. Suppose we have a Batchlet that produces a dynamic set of files in a first step.
  2. In a second step, all these files must be processed individually in chunks (via ItemReader, ItemProcessor and ItemWriter) resulting in a new set of files.
  3. In a third step these new files need to be packaged in one large archive.

I couldn't find a way to define the second step because the specification doesn't seem to provide a loop construct (and in my understanding partition, split and flow only work for a set with a known fixed size).

How could a job xml definition look like? Do I have to give up on the idea of chunking in the second step or do I have to divide the task into multiple jobs? Is there another option?


Solution

  • You can use a PartitionMapper to programmatically define a dynamic number of partitions for a partitioned step.

    The mapper needs to create a PartitionPlan object which sets the number of partitions and provides partition-specific properties for each.

    Your mapper's mapPartitions() method will look something like this outline:

    public PartitionPlan mapPartitions() throws Exception {
    
        int numPartitions = // calculate number of partitions, however you want
    
        // create an array of Properties objects, one for each partition
        Properties[] props = new Properties[numPartitions];
    
        for (int i = 0; i < numPartitions; i++) {
            // create a Properties object for this partition
            props[i] = new Properties();
    
            props[i].setProperty("abc", ...);
            props[i].setProperty("xyz", ...);
        }
    
        // use the built-in PartitionPlanImpl from the spec or your own impl
        PartitionPlan partitionPlan = new PartitionPlanImpl(); 
        partitionPlan.setPartitions(numPartitions);
    
        // cet the Properties[] onto your plan
        partitionPlan.setPartitionProperties(props);
    
        return partitionPlan;
    }
    

    And then you can reference the partition-specific property values in substitution like this (which is the same way you reference statically-defined partition properties):

        <batchlet ref="myBatchlet">
            <properties>
                <property name="propABC" value="#{partitionPlan['abc']}" />
                <property name="propXYZ" value="#{partitionPlan['xyz']}" />
            </properties>
        </batchlet>