How to filter items in bulks?

I'm implementing a Job with Spring Batch, and my simple scenario is:

read items from a file
process each item
write item chunks to a database

The problem is that I want to filter out the items that are already present in the database.

My first try was to query the database in step two for the current item and, if already present, return null from the ItemProcessor so that the item is filtered out. This was unnecessary slow because is needed one query for each processed item.

So my second try was to override the doWrite method in the ItemWriter to make a single query for the whole chunk and write only the items without a match from the query. Even if there was a neat increase in performance this doesn't look good to me (in this way Spring Batch cannot see what I'm actually writing to the database, indeed write and filter counters from the StepContext have wrong values).

What is the correct way to implement this processing logic?

Solution

What you seem to be calling skip (as in skip the items that are already present in the database) is actually filtering items and not skipping them according to Spring Batch's terminology. The skip feature in Spring Batch is for invalid items (ie those that cause an exception to be thrown while reading, processing or writing them).

A common technique that could work well in your use case is to make the item writer idempotent, by doing a "save or update" operation. This removes the need to check if an item is present or not. Moreover, this is useful in case of failure as you can just re-run the failed job without having to store any progress state.