My step is supposed to write a ton of items to DB table with unique index on few columns, therefore some items will produce DataIntegrityViolationException. I want to make my step faultTolerant to that without setting chunk size to 1. The following configuration unfortunately does not work as expected and just skip whole chunk when exception occurs, probably I misread something here:
stepBuilderFactory.get("someStep")
.<InputDto, Entity>chunk(100)
.reader(reader())
.processor(processor())
.writer(repositoryItemWriter(repository())
.faultTolerant()
.skipLimit(Integer.MAX_VALUE)
.skip(DataIntegrityViolationException.class)
.noRollback(DataIntegrityViolationException.class)
.processorNonTransactional()
.build();
Also as mentioned here behaviour of exception skip in chunk sounds a bit expensively, doesn't it? What is the most efficent way to deal with it then? Select for uniqueness check before insert doesn't look great either.
some items will produce DataIntegrityViolationException. I want to make my step faultTolerant to that
DataIntegrityViolationException
is not a fault you want to tolerate. This is a fault that you want your job to fail at. A transient error could be tolerated, a temporary network issue could be tolerated, but an error that is related to data integrity or consistency should not be tolerated.
The following configuration unfortunately does not work as expected and just skip whole chunk when exception occurs [..] Also as mentioned here behaviour of exception skip in chunk sounds a bit expensively, doesn't it?
The exception you are getting happens at the commit time of the transaction and Spring Batch cannot know which item(s) caused the issue. Hence it will scan the chunk item by item to determine the faulty item and skip it. And yes, that has a cost. The answer you refer to explains the mechanism in detail.
What is the most efficent way to deal with it then? Select for uniqueness check before insert doesn't look great either.
In my opinion, skipping the DataIntegrityViolationException
is working around the problem rather than fixing it. I would add a processor or a listener (ItemWriteListener#beforeWrite
) that checks data integrity constraints and reject faulty items before writing items. Data validation is a typical use case for an item processor.