Search code examples
springhibernatetransactionsbatch-processinglarge-data

Spring,Hibernate - Batch processing of large amounts of data with good performance


Imagine you have large amount of data in database approx. ~100Mb. We need to process all data somehow (update or export to somewhere else). How to implement this task with good performance ? How to setup transaction propagation ?

Example 1# (with bad performance) :

@Singleton
public ServiceBean {

 procesAllData(){

   List<Entity> entityList = dao.findAll();

   for(...){
     process(entity);
   }

 }

 private void process(Entity ent){
  //data processing    
  //saves data back (UPDATE operation) or exports to somewhere else (just READs from DB)
 }

}

What could be improved here ?

In my opinion :

  1. I would set hibernate batch size (see hibernate documentation for batch processing).
  2. I would separated ServiceBean into two Spring beans with different transactions settings. Method processAllData() should run out of transaction, because it operates with large amounts of data and potentional rollback wouldnt be 'quick' (i guess). Method process(Entity entity) would run in transaction - no big thing to make rollback in the case of one data entity.

Do you agree ? Any tips ?


Solution

  • Here are 2 basic strategies:

    1. JDBC batching: set the JDBC batch size, usually somewhere between 20 and 50 (hibernate.jdbc.batch_size). If you are mixing and matching object C/U/D operations, make sure you have Hibernate configured to order inserts and updates, otherwise it won't batch (hibernate.order_inserts and hibernate.order_updates). And when doing batching, it is imperative to make sure you clear() your Session so that you don't run into memory issues during a large transaction.
    2. Concatenated SQL statements: implement the Hibernate Work interface and use your implementation class (or anonymous inner class) to run native SQL against the JDBC connection. Concatenate hand-coded SQL via semicolons (works in most DBs) and then process that SQL via doWork. This strategy allows you to use the Hibernate transaction coordinator while being able to harness the full power of native SQL.

    You will generally find that no matter how fast you can get your OO code, using DB tricks like concatenating SQL statements will be faster.