Search code examples
javajakarta-eejava-ee-7java-batch

Are batchlets the correct way of implementing ETL steps in JavaEE Batch?


I am studying Javaee Batch API (jsr-352) in order to test the feasibility of changing out current ETL tool for our own solution using this technology.

My goal is to build a job in which I:

  • get some (dummy) data from a datasource in step1,
  • some other data from other data-source in step2 and
  • merge them in step3.

I would like to process each item and not write to a file, but send it to the next step. And also store the information for further use. I could do that using batchlets and jobContext.setTransientUserData().

I think I am not getting the concepts right: as far as I understood, JSR-352 is meant for this kind of ETL tasks, but it has 2 types of steps: chunk and batchlets. Chunks are "3-phase-steps", in which one reads, processes and writes the data. Batchlets are tasks that are not performed on each item on the data, but once (as calculating totals, sending email and others).

My problem is that my solution is not correct if I consider the definition of batchlets.

How could one implement this kinf od job using Javaee Batch API?


Solution

  • I think you better to use chunk rather than batchlet to implement ETLs. typical chunk processing with a datasource is something like following:

    • ItemReader#open(): open a cursor (create Connection, Statement and ResultSet) and save them as instance variables of ItemReader.
    • ItemReader#readItem(): create and return a object that contains data of a row using ResultSet
    • ItemReader#close(): close JDBC resources
    • ItemProcessor#processItem(): do calculation and create and return a object which contains result
    • ItemWriter#writeItems(): save calculated data to database. open Connection, Statement and invoke executeUpdate() and close them.

    As to your situation, I think you have to choose one data which considerble as primary one, and open a cursor for it in ItemReader#open(). then get another one in ItemProcessor#processItem() for each item.

    Also I recommend you to read useful examples of chunk processing:

    My blog entries about JBatch and chunk processing: