Search code examples
oracle-databasehibernatejdbceclipse-rcptomcat8

What's the effect of removing @GeneratedValue from the Id on the performance


I'm working on a RCP-application, which communicates with an a Tomcat-server using Rest. Since we've gotten more and more data, the load/copy-routines are slowly but surly becoming obsolete. It's taking me sometimes minutes to execute some copy-operation. So I'm looking for some advise, how to speed up my routines.

Here are the technologies I'm using:

  1. RCP-Client (e4-plattform)
  2. Tomcat8-Server
  3. Oracle-DB
  4. JDBC as API with Hibernate
  5. Rest

First thing first. I checked the entities and the pretty much all look like the code below

@Entity
@SequenceGenerator(name = "CHECKITEM_SEQ", sequenceName = "CHECKITEM_SEQ", allocationSize = 1)
public class CheckItem extends AbstractTreeNode implements Serializable,Cloneable {...}

I figured by copying data (which are most of the time over 200K per operation) since I use them as primary key,

@Id
@GeneratedValue(generator = "CHECKITEM_SEQ", strategy = GenerationType.SEQUENCE)
    public Integer getId() {
        return id;
    }

the DB must generate per object a sequence and check the constraint on it, So I was wondering how much performance I would gain, if I remove the Sequence since i don't really use/need them in the DB. Now my questions:

  1. Is there anything that speaks against removing a constraint(primary key in this particularly case) in DB?
  2. Has anyone more/better suggestions how to increase the performance of DB for such operations?
  3. Can I have a tutorial or document, which can help me through this process?

I hope, i was clear enough and I will appreciate any kind of help. thanks already.


Solution

  • The problem with using @GeneratedValue identifiers is that in order for Hibernate to place the new entity into the Persistence Context (the first level cache), it must know the identifier. So when you're using IDENTITY or SEQUENCE based identifiers, this can impact the JDBC driver from being able to adequately batch insert operations.

    For example, you illustrated that most of your entities use the following sequence generation:

    @SequenceGenerator(
       name = "CHECKITEM_SEQ", 
       sequenceName = "CHECKITEM_SEQ", 
       allocationSize = 1)
    

    So whenever a persist operation for an entity happens, you're telling the sequence generator to only generate one value, so the JDBC communication looks like this:

    1. Get Next Sequence
    2. Insert
    3. Get Next Sequence
    4. Insert
    5. Get Next Sequence
    6. Insert
    

    As seen here, we cannot batch the insert operations because we must fetch the identifier for each insert operation before the insert operation can happen. One solution to minimize that impact and deal with batch inserts is to use a larger allocationSize.

    1. allocationSize=10 -> Get Next 10 sequences
    2 - 11. Perform 10 inserts in batch
    Repeat
    

    As you can see here the driver can do 10 inserts in a batch, Hibernate allocates the sequences in batches on 10 and so the inserts can happen much faster.

    Obviously this comes with a small drawback, if you allocate 10 sequences but the remaining batch only needs to insert 6 entities; you've wasted 4 sequence values but you gain the performance from being able to support doing jdbc batch inserts.

    The next logical step would be to determine if you can eliminate the use of a @GeneratedValue all together as that would given you the maximum performance with batch inserts for your copy operations; however that may not be possible with your data model. In the past when I dealt with moving large volumes of data, I tried to define the primary key based on natural keys from the data without involving a surrogate key if possible.

    Feel free to read more about JDBC batch operations here.