Search code examples
springspring-batchbatch-processingsystem-design

Break down one big step into multiple steps in Spring Batch


I am new to the batch processing world and I am trying to solve the below mentioned problem using Spring Batch. I am really struggling at how to create multiple step batch job out of it.

Given

A csv file having records for multiple students

studentId subject1_score subject2_score subject3_score result
1 59 51 54 PENDING
2 79 20 76 PENDING

We have a REST endpoint which take students marks in all subjects and return result (pass/fail) for each student. Pass/fail logic is defined in the given rest endpoint.

TODO

Read the batch of records out of that csv, make a REST call per batch which updates the result on the basis of marks in all three subjects for each student. Update the result for each student and generate the output csv for all the records.

Class StudentMarksheet {
    String studentId;
    Integer subject1_score;
    Integer subject2_score;
    Integer subject3_score;
    String result;

    ...
}

Class GenerateResultRequestResponseDto {
    Long batchId
    List<StudentMarksheet> students;
    
    ...
}
studentId subject1_score subject2_score subject3_score result
1 59 51 54 PASS
2 79 20 76 FAIL

Update on Requirement

We can receive either a csv or an xml file. Based on the file type we have two different reader and writer (one for reading and writing csv file and one for xml file type).

My Design solution

Read single record and create a StudentMarksheet object from it -> processor decided where we have a valid record or not -> writer prepares the GenerateResultRequestResponseDto, execute the rest call for 1 batch of records and write it to csv file.

Big question here is do I make two jobs, one for CSV & other for XML?


Solution

  • Since you REST endpoint accepts a list of students that you need to process in chunks just before writing them to the file, you can use an ItemWriteListener#beforeWrite(List) and make your call in there. This listener is the first extension point where get a list of items. So your chunk-oriented step could be designed as follows:

    • Item reader: FlatFileItemReader to read students one by one
    • Item processor: validate students
    • ItemWriteListener: Make the REST call for the current chunk of students and update their statuses
    • ItemWriter: write updated students to the output file