Search code examples
springspring-batch

The lookup from the cache into processor is taking lot of time - Spring Batch project


I'm using Spring Batch XML based configurations in my project. I've implemented logic by taking reference from: Spring Batch With Annotation and Caching

Currently we're reading FlatFile which has Customer Booking information (FlatFile has roughly 14 fields), volume of data in cache is 1 million and have used org.springframework.cache.concurrent.ConcurrentMapCacheManager implementation.

Now I've Request data having 15 millions records into FlatFile (Having 60 fields). Now into the processor, we're looking up the data into cache and creating final object out of it.

private Optional<SomeBooking> performLookup(List<SomeBooking> existingCacheData, SomeItem item) {
        return existingCacheData.stream()
                .filter(booking -> item.getId().equals(booking.getId()) && item.getSomeNum().equals(booking.getSomeNum()))
                .findAny();
    }

Another code snippet

 List<FinalBooking> existingCacheData = (List<SomeBooking>) cacheManager.getCache("reference-data").get("data").get();
                Optional<SomeBooking> opAbBooking = performLookup(existingCacheData, item);
                if(opBooking.isPresent()) {
                    AFinalBooking autoborrowBooking = opBooking.get();
                ..........
                .........
         }

The lookup from the cache is taking almost 10ms to 30 ms for per item and to process 15 millions records its taking around 10 hrs which seems not good figures. The final output records now has 74 fields/record.

Time taken: 145 milliseconds
Time taken: 143 milliseconds
Time taken: 89 milliseconds
Time taken: 133 milliseconds
Time taken: 141 milliseconds
Time taken: 58 milliseconds
Time taken: 67 milliseconds
Time taken: 134 milliseconds
Time taken: 131 milliseconds
Time taken: 142 milliseconds
Time taken: 117 milliseconds
Time taken: 140 milliseconds
Time taken: 84 milliseconds
Time taken: 86 milliseconds
Time taken: 133 milliseconds
Time taken: 107 milliseconds
Time taken: 86 milliseconds
Time taken: 38 milliseconds
Time taken: 75 milliseconds
Time taken: 125 milliseconds
Time taken: 76 milliseconds
Time taken: 84 milliseconds
Time taken: 132 milliseconds
Time taken: 68 milliseconds
Time taken: 135 milliseconds
Time taken: 97 milliseconds

Can you please suggest how can we improve the performance? Here we're reading the FlatFile and creating Multiple output files (using org.springframework.batch.item.support.ClassifierCompositeItemWriter) when some condition matches. We're also making the use of org.springframework.batch.item.file.MultiResourceItemWriter to create multiple versions of FlatFile when itemCountLimitPerResource is reached. So we're already using complex beans in our project.


Solution

  • It is not clear from your post how big the list 'existingCacheData' is, but it seems non-intuitive to cache a list and then iterate it every time you're looking for something. One would normally cache a data structure like a Map and define a key based on getId() and getSomeNum() for this use case if the list is sufficiently large that the amortized constant time performance of the Map is better than the average iterative search. Alternately you could also cache the individual objects and retrieve them directly from the cache with a similarly derived key.