Search code examples
postgresqljdbceclipselinkjpa-2.1

Eclipselink and Postgresql batch writing


I have been working on a BulkSMS solution for one of my customers, and I have decided to use JPA (Eclipselink) as an ORM and the underlying database is PostgreSQL 9.5.1.

My problem issue is that whenever I send a request with 65,000 records to be persisted it takes around 27 seconds to complete the operation. I decided to implement sequence pooling, sequence preallocation =1000, and batch writing, however this only managed to remove 15 seconds from the operation.

After investigating the database logs I noticed that the same queries are being called before and after applying the optimization.

Here is my optimized persistance.xml:

<persistence version="2.1" xmlns="http://xmlns.jcp.org/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/persistence http://xmlns.jcp.org/xml/ns/persistence/persistence_2_1.xsd">
<persistence-unit name="com.kw.ktt.sms.server" transaction-type="JTA">
    <jta-data-source>SMSDB</jta-data-source>
     <non-jta-data-source>sequence</non-jta-data-source> 
    <class>com.kw.ktt.sms.server.core.TestClass</class>
    <class>com.kw.ktt.sms.server.jpa.Customer</class>
    <class>com.kw.ktt.sms.server.jpa.SMSAccount</class>
    <class>com.kw.ktt.sms.server.jpa.SMSTransaction</class>
    <class>com.kw.ktt.sms.server.jpa.ContactGroup</class>
    <class>com.kw.ktt.sms.server.jpa.PhoneNumber</class>
    <properties>
        <property name="eclipselink.application-location" value="/Users/mousaalsulaimi/Desktop"/>
        <property name="eclipselink.ddl-generation.output-mode" value="database"/>
        <property name="eclipselink.logging.connection" value="true"/>
        <property name="javax.persistence.schema-generation.database.action" value="drop-and-create"/>
        <property name="eclipselink.ddl-generation" value="drop-and-create-tables"/>
        <property name="eclipselink.jdbc.batch-writing" value="JDBC" /> 
        <property name="eclipselink.jdbc.batch-writing.size" value="1000"/> 
        <property name="eclipselink.jdbc.sequence-connection-pool"  value="true" />
        <property name="eclipselink.connection-pool.sequence.nonJtaDataSource" value="sequence"/>
        <property name="eclipselink.connection-pool.sequence.intial" value="1000" /> 
    </properties>
</persistence-unit>

As mentioned above I'm using a JTA connection pool for persistence (called SMSDB) and non JTA connection for sequencing (called sequence) each with a different DB user in order to easily track the connection the DB log.

Logs for the unoptimized connection are here - this is a sample of 10 records only.

The logs for the optimized connection are here - this is a sample of 10 records only.

Can someone explain to me what I am doing wrong and why both persistence setting produce the same queries even though there is a actual improvement of 15 seconds.

One more thing , I have set the sequence preallocation to 1000 in the Entitiy's source code and judging from the database logs sequencing is working as expected and is picking up the correct increment value . What I'm concerned with is the batch writing and I fear that it is not set up correctly in the persistence.xml

Update

I have enables the logging in eclipse link as advised by Chris , here is the eclipseLink logs produced when using the optimized persistence.xml

2016-02-27T23:59:28.307+0300|Fine: SELECT CUSTOMERID, CIVILIDNUMBER, CREATEDATE, CREATEDBY, EMAIL, FULLNAME, ISACTIVE, ISADMIN, MALE, PASSWORD, PERSONAL, PHONENUMBER, STATUS, USERNAME, ACCOUNT_SMSACCOUNTID FROM CUSTOMER WHERE (CUSTOMERID = ?)
    bind => [1 parameter bound]
2016-02-27T23:59:28.310+0300|Fine: SELECT SMSACCOUNTID, OOOREDOOO_BALANCE, VIVA_BALANCE, ZAIN_BALANCE FROM SMSACCOUNT WHERE (SMSACCOUNTID = ?)
    bind => [1 parameter bound]
2016-02-27T23:59:28.312+0300|Fine: select nextval('SEQ_GEN_SEQUENCE')
2016-02-27T23:59:28.327+0300|Fine: select nextval('number_seq')
2016-02-27T23:59:28.331+0300|Info: this is the id 1
2016-02-27T23:59:28.332+0300|Fine: INSERT INTO SMSACCOUNT (SMSACCOUNTID, OOOREDOOO_BALANCE, VIVA_BALANCE, ZAIN_BALANCE) VALUES (?, ?, ?, ?)
    bind => [4 parameters bound]
2016-02-27T23:59:28.335+0300|Fine: INSERT INTO CUSTOMER (CUSTOMERID, CIVILIDNUMBER, CREATEDATE, CREATEDBY, EMAIL, FULLNAME, ISACTIVE, ISADMIN, MALE, PASSWORD, PERSONAL, PHONENUMBER, STATUS, USERNAME, ACCOUNT_SMSACCOUNTID) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
    bind => [15 parameters bound]
2016-02-27T23:59:28.337+0300|Fine: INSERT INTO CONTACTGROUP (GROUPID, CREATEBY, CREATEDATE, GROUPDESCRIPTION, GROUPNAME) VALUES (?, ?, ?, ?, ?)
    bind => [5 parameters bound]
2016-02-27T23:59:28.339+0300|Fine: INSERT INTO PHONENUMBER (NUMBERID, OPERATOR, PHONENUMBER) VALUES (?, ?, ?)
2016-02-27T23:59:28.339+0300|Fine: bind => [3 parameters bound]
2016-02-27T23:59:28.339+0300|Fine: bind => [3 parameters bound]
2016-02-27T23:59:28.339+0300|Fine: bind => [3 parameters bound]
2016-02-27T23:59:28.339+0300|Fine: bind => [3 parameters bound]
2016-02-27T23:59:28.339+0300|Fine: bind => [3 parameters bound]
2016-02-27T23:59:28.339+0300|Fine: bind => [3 parameters bound]
2016-02-27T23:59:28.339+0300|Fine: bind => [3 parameters bound]
2016-02-27T23:59:28.340+0300|Fine: bind => [3 parameters bound]
2016-02-27T23:59:28.340+0300|Fine: bind => [3 parameters bound]
2016-02-27T23:59:28.340+0300|Fine: bind => [3 parameters bound]
2016-02-27T23:59:28.342+0300|Fine: UPDATE CONTACTGROUP SET customerID = ? WHERE (GROUPID = ?)
    bind => [2 parameters bound]
2016-02-27T23:59:28.343+0300|Fine: UPDATE PHONENUMBER SET groupId = ? WHERE (NUMBERID = ?)
2016-02-27T23:59:28.344+0300|Fine: bind => [2 parameters bound]
2016-02-27T23:59:28.344+0300|Fine: bind => [2 parameters bound]
2016-02-27T23:59:28.344+0300|Fine: bind => [2 parameters bound]
2016-02-27T23:59:28.344+0300|Fine: bind => [2 parameters bound]
2016-02-27T23:59:28.344+0300|Fine: bind => [2 parameters bound]
2016-02-27T23:59:28.344+0300|Fine: bind => [2 parameters bound]
2016-02-27T23:59:28.344+0300|Fine: bind => [2 parameters bound]
2016-02-27T23:59:28.344+0300|Fine: bind => [2 parameters bound]
2016-02-27T23:59:28.344+0300|Fine: bind => [2 parameters bound]
2016-02-27T23:59:28.344+0300|Fine: bind => [2 parameters bound]

and below the eclipse link logs produced when using the original persistence.xml

2016-02-28T08:56:25.440+0300|Fine: SELECT CUSTOMERID, CIVILIDNUMBER, CREATEDATE, CREATEDBY, EMAIL, FULLNAME, ISACTIVE, ISADMIN, MALE, PASSWORD, PERSONAL, PHONENUMBER, STATUS, USERNAME, ACCOUNT_SMSACCOUNTID FROM CUSTOMER WHERE (CUSTOMERID = ?)
    bind => [1 parameter bound]
2016-02-28T08:56:25.443+0300|Fine: SELECT SMSACCOUNTID, OOOREDOOO_BALANCE, VIVA_BALANCE, ZAIN_BALANCE FROM SMSACCOUNT WHERE (SMSACCOUNTID = ?)
    bind => [1 parameter bound]
2016-02-28T08:56:25.445+0300|Fine: select nextval('SEQ_GEN_SEQUENCE')
2016-02-28T08:56:25.447+0300|Fine: select nextval('number_seq')
2016-02-28T08:56:25.449+0300|Info: this is the id 1
2016-02-28T08:56:25.450+0300|Fine: INSERT INTO SMSACCOUNT (SMSACCOUNTID, OOOREDOOO_BALANCE, VIVA_BALANCE, ZAIN_BALANCE) VALUES (?, ?, ?, ?)
    bind => [4 parameters bound]
2016-02-28T08:56:25.451+0300|Fine: INSERT INTO CUSTOMER (CUSTOMERID, CIVILIDNUMBER, CREATEDATE, CREATEDBY, EMAIL, FULLNAME, ISACTIVE, ISADMIN, MALE, PASSWORD, PERSONAL, PHONENUMBER, STATUS, USERNAME, ACCOUNT_SMSACCOUNTID) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
    bind => [15 parameters bound]
2016-02-28T08:56:25.452+0300|Fine: INSERT INTO CONTACTGROUP (GROUPID, CREATEBY, CREATEDATE, GROUPDESCRIPTION, GROUPNAME) VALUES (?, ?, ?, ?, ?)
    bind => [5 parameters bound]
2016-02-28T08:56:25.452+0300|Fine: INSERT INTO PHONENUMBER (NUMBERID, OPERATOR, PHONENUMBER) VALUES (?, ?, ?)
    bind => [3 parameters bound]
2016-02-28T08:56:25.453+0300|Fine: INSERT INTO PHONENUMBER (NUMBERID, OPERATOR, PHONENUMBER) VALUES (?, ?, ?)
    bind => [3 parameters bound]
2016-02-28T08:56:25.453+0300|Fine: INSERT INTO PHONENUMBER (NUMBERID, OPERATOR, PHONENUMBER) VALUES (?, ?, ?)
    bind => [3 parameters bound]
2016-02-28T08:56:25.454+0300|Fine: INSERT INTO PHONENUMBER (NUMBERID, OPERATOR, PHONENUMBER) VALUES (?, ?, ?)
    bind => [3 parameters bound]
2016-02-28T08:56:25.454+0300|Fine: INSERT INTO PHONENUMBER (NUMBERID, OPERATOR, PHONENUMBER) VALUES (?, ?, ?)
    bind => [3 parameters bound]
2016-02-28T08:56:25.454+0300|Fine: INSERT INTO PHONENUMBER (NUMBERID, OPERATOR, PHONENUMBER) VALUES (?, ?, ?)
    bind => [3 parameters bound]
2016-02-28T08:56:25.455+0300|Fine: INSERT INTO PHONENUMBER (NUMBERID, OPERATOR, PHONENUMBER) VALUES (?, ?, ?)
    bind => [3 parameters bound]
2016-02-28T08:56:25.455+0300|Fine: INSERT INTO PHONENUMBER (NUMBERID, OPERATOR, PHONENUMBER) VALUES (?, ?, ?)
    bind => [3 parameters bound]
2016-02-28T08:56:25.455+0300|Fine: INSERT INTO PHONENUMBER (NUMBERID, OPERATOR, PHONENUMBER) VALUES (?, ?, ?)
    bind => [3 parameters bound]
2016-02-28T08:56:25.456+0300|Fine: INSERT INTO PHONENUMBER (NUMBERID, OPERATOR, PHONENUMBER) VALUES (?, ?, ?)
    bind => [3 parameters bound]
2016-02-28T08:56:25.456+0300|Fine: UPDATE CONTACTGROUP SET customerID = ? WHERE (GROUPID = ?)
    bind => [2 parameters bound]
2016-02-28T08:56:25.457+0300|Fine: UPDATE PHONENUMBER SET groupId = ? WHERE (NUMBERID = ?)
    bind => [2 parameters bound]
2016-02-28T08:56:25.457+0300|Fine: UPDATE PHONENUMBER SET groupId = ? WHERE (NUMBERID = ?)
    bind => [2 parameters bound]
2016-02-28T08:56:25.458+0300|Fine: UPDATE PHONENUMBER SET groupId = ? WHERE (NUMBERID = ?)
    bind => [2 parameters bound]
2016-02-28T08:56:25.458+0300|Fine: UPDATE PHONENUMBER SET groupId = ? WHERE (NUMBERID = ?)
    bind => [2 parameters bound]
2016-02-28T08:56:25.459+0300|Fine: UPDATE PHONENUMBER SET groupId = ? WHERE (NUMBERID = ?)
    bind => [2 parameters bound]
2016-02-28T08:56:25.459+0300|Fine: UPDATE PHONENUMBER SET groupId = ? WHERE (NUMBERID = ?)
    bind => [2 parameters bound]
2016-02-28T08:56:25.460+0300|Fine: UPDATE PHONENUMBER SET groupId = ? WHERE (NUMBERID = ?)
    bind => [2 parameters bound]
2016-02-28T08:56:25.460+0300|Fine: UPDATE PHONENUMBER SET groupId = ? WHERE (NUMBERID = ?)
    bind => [2 parameters bound]
2016-02-28T08:56:25.460+0300|Fine: UPDATE PHONENUMBER SET groupId = ? WHERE (NUMBERID = ?)
    bind => [2 parameters bound]
2016-02-28T08:56:25.461+0300|Fine: UPDATE PHONENUMBER SET groupId = ? WHERE (NUMBERID = ?)
    bind => [2 parameters bound]

obviously there is a big difference between the queries produced when using the optimized persistence.xml and the original persistence.xml .


Solution

  • Turn on EclipseLink's SQL logging and you should see a difference in how the statements are prepared and processed in JDBC, which should show why there is a 15 second difference.

    I'm not familiar with the eclipselink.connection-pool.sequence.intial property - What you should likely be using is the allocationSize configuration within your sequence generator itself to allow obtaining 1000 sequences at a time.

    If it is not set, batch writing will cut down the number of insert statements, but you will still see a large number of statements to obtain sequence numbers, but over different connections - sequencing is using its own connection pool.