Search code examples
springhibernatespring-batchc3p0jta

Rollback fail : connection closed after database restart on Spring-batch


I'm reproducing a connection problem by manually killing and restarting my PostGreSQL database during a Spring Batch. I've been using the DriverManagerDataSource, which just crashed at this point, so I had to change. Now I'm trying to use C3P0 (with Hibernate).

The post is long, but it's mostly config files or stacktraces you do not necessarily need to read.

My main problem is (I think): c3p0 closes the connection, then Sping-JTA tries to rollback against it, and fails, which kills the batch. And I don't know how to prevent it.


Configuration:

I'm using Spring, Spring batch (4.2.3), hibernate (4.3.6) with c3p0 (0.9.5.2) and postgresql.

applicationContext :

<bean id="jpaVendorAdapter"
    class="org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter">
    <property name="database" value="POSTGRESQL" />
    <property name="databasePlatform" value="org.hibernate.dialect.PostgreSQLDialect" />
    <property name="generateDdl" value="false" />
    <property name="showSql" value="false" />
</bean>

<bean id="entityManagerFactory"
    class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean">
    <property name="persistenceXmlLocation" value="classpath:META-INF/persistence.xml" />
    <property name="persistenceUnitName" value="ASO_PU" />
    <property name="jpaVendorAdapter" ref="jpaVendorAdapter" />
</bean>

<tx:annotation-driven transaction-manager="transactionManager" />
<bean id="transactionManager" class="org.springframework.orm.jpa.JpaTransactionManager">
    <property name="entityManagerFactory" ref="entityManagerFactory" />
</bean>

persitence.xml

<persistence-unit name="ASO_PU" transaction-type="RESOURCE_LOCAL">
    <provider>org.hibernate.ejb.HibernatePersistence</provider>
    <class> ** my entities </class>
    <exclude-unlisted-classes>true</exclude-unlisted-classes>
    <properties>
        <property name="hibernate.dialect" value="org.hibernate.dialect.PostgreSQLDialect"/>

        <property name="hibernate.connection.driver_class" value="org.postgresql.Driver" />
        <property name="hibernate.connection.username" value="postgres" />
        <property name="hibernate.connection.password" value="postgres" />
        <property name="hibernate.connection.url" value="jdbc:postgresql://localhost:5434/aso_test" />
        <property name="hibernate.connection.provider_class" value="org.hibernate.connection.C3P0ConnectionProvider"/>

        <property name="hibernate.c3p0.maxSize" value="6"/> <!-- maxPoolSize -->
        <property name="hibernate.c3p0.minSize" value="2"/> <!-- minPoolSize -->
        <property name="hibernate.c3p0.maxStatements" value="30"/> <!-- maxStatements -->
        <property name="hibernate.c3p0.validate" value="true"/> <!-- testConnectionOnCheckout -->
        <property name="hibernate.c3p0.preferredTestQuery" value="SELECT 1"/> <!--  -->

            <property name="hibernate.c3p0.acquireRetryAttempts" value="121"/>
            <property name="hibernate.c3p0.acquireRetryDelay" value="2002"/>

        </properties>
    </persistence-unit>

What happens:

When the application tries to connect to the DB after I killed it, it warns me, and then waits until the DB is up again (I don't think it's relevant but here is the trace anyway:)

ATTENTION: [c3p0] A PooledConnection that has already signalled a Connection error is still in use!
31 août 2016 11:44:54 com.mchange.v2.c3p0.impl.NewPooledConnection handleThrowable
ATTENTION: [c3p0] Another error has occurred [ org.postgresql.util.PSQLException: This connection has been closed. ] which will not be reported to listeners!
org.postgresql.util.PSQLException: This connection has been closed.
    at org.postgresql.jdbc2.AbstractJdbc2Connection.checkClosed(AbstractJdbc2Connection.java:837)
    at org.postgresql.jdbc2.AbstractJdbc2Connection.rollback(AbstractJdbc2Connection.java:854)
    at com.mchange.v2.c3p0.impl.NewProxyConnection.rollback(NewProxyConnection.java:860)
    at org.hibernate.engine.transaction.internal.jdbc.JdbcTransaction.doRollback(JdbcTransaction.java:163)
    at org.hibernate.engine.transaction.spi.AbstractTransactionImpl.rollback(AbstractTransactionImpl.java:211)
    at org.hibernate.jpa.internal.TransactionImpl.rollback(TransactionImpl.java:108)
    at org.springframework.orm.jpa.JpaTransactionManager.doRollback(JpaTransactionManager.java:544)
    at org.springframework.transaction.support.AbstractPlatformTransactionManager.processRollback(AbstractPlatformTransactionManager.java:853)
    at org.springframework.transaction.support.AbstractPlatformTransactionManager.rollback(AbstractPlatformTransactionManager.java:830)
    at org.springframework.transaction.support.TransactionTemplate.rollbackOnException(TransactionTemplate.java:164)
    at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:137)
    at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:271)
    at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:77)
    at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:368)
    at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:215)
    at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:144)
    at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:257)
    at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:198)
    at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:148)
    at org.springframework.batch.core.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:64)
    at org.springframework.batch.core.job.flow.support.state.StepState.handle(StepState.java:67)
    at org.springframework.batch.core.job.flow.support.SimpleFlow.resume(SimpleFlow.java:165)
    at org.springframework.batch.core.job.flow.support.SimpleFlow.start(SimpleFlow.java:144)
    at org.springframework.batch.core.job.flow.FlowJob.doExecute(FlowJob.java:134)
    at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:304)
    at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:135)
    at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:50)
    at org.springframework.batch.core.launch.support.SimpleJobLauncher.run(SimpleJobLauncher.java:128)
    at fr.aso.batch.test.MyBatchTest.test(MyBatchTest.java:100)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.springframework.test.context.junit4.statements.RunBeforeTestMethodCallbacks.evaluate(RunBeforeTestMethodCallbacks.java:75)
    at org.springframework.test.context.junit4.statements.RunAfterTestMethodCallbacks.evaluate(RunAfterTestMethodCallbacks.java:86)
    at org.springframework.test.context.junit4.statements.SpringRepeat.evaluate(SpringRepeat.java:84)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
    at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:254)
    at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:89)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.springframework.test.context.junit4.statements.RunBeforeTestClassCallbacks.evaluate(RunBeforeTestClassCallbacks.java:61)
    at org.springframework.test.context.junit4.statements.RunAfterTestClassCallbacks.evaluate(RunAfterTestClassCallbacks.java:70)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.run(SpringJUnit4ClassRunner.java:193)
    at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:49)
    at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
31 août 2016 11:44:54 com.mchange.v2.c3p0.impl.NewPooledConnection handleThrowable
ATTENTION: [c3p0] A PooledConnection that has already signalled a Connection error is still in use!

After this exception, the batch hangs. Once the DB is up again (1 minute after), the whole batch crashes:

     01/09/2016 09:22:47.537 [ERROR]    REI - org.springframework.batch.core.job.AbstractJob            Exception encountered in afterStep callback 
org.springframework.transaction.TransactionSystemException: Could not roll back JPA transaction; nested exception is javax.persistence.PersistenceException: unexpected error when rollbacking
    at org.springframework.orm.jpa.JpaTransactionManager.doRollback(JpaTransactionManager.java:548) ~[spring-orm-4.2.3.RELEASE.jar:4.2.3.RELEASE]
    at org.springframework.transaction.support.AbstractPlatformTransactionManager.processRollback(AbstractPlatformTransactionManager.java:853) ~[spring-tx-4.2.3.RELEASE.jar:4.2.3.RELEASE]
    at org.springframework.transaction.support.AbstractPlatformTransactionManager.rollback(AbstractPlatformTransactionManager.java:830) ~[spring-tx-4.2.3.RELEASE.jar:4.2.3.RELEASE]
    at org.springframework.transaction.interceptor.TransactionAspectSupport.completeTransactionAfterThrowing(TransactionAspectSupport.java:503) ~[spring-tx-4.2.3.RELEASE.jar:4.2.3.RELEASE]
    at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:285) ~[spring-tx-4.2.3.RELEASE.jar:4.2.3.RELEASE]
    at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:96) ~[spring-tx-4.2.3.RELEASE.jar:4.2.3.RELEASE]
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.2.3.RELEASE.jar:4.2.3.RELEASE]
    at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:208) ~[spring-aop-4.2.3.RELEASE.jar:4.2.3.RELEASE]
    at $Proxy204.afterJob(Unknown Source) ~[na:na]
    at org.springframework.batch.core.listener.CompositeJobExecutionListener.afterJob(CompositeJobExecutionListener.java:60) ~[spring-batch-core-3.0.4.RELEASE.jar:3.0.4.RELEASE]
    at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:346) ~[spring-batch-core-3.0.4.RELEASE.jar:3.0.4.RELEASE]
    at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:135) [spring-batch-core-3.0.4.RELEASE.jar:3.0.4.RELEASE]
    at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:50) [spring-core-4.2.3.RELEASE.jar:4.2.3.RELEASE]
    at org.springframework.batch.core.launch.support.SimpleJobLauncher.run(SimpleJobLauncher.java:128) [spring-batch-core-3.0.4.RELEASE.jar:3.0.4.RELEASE]
    at fr.aso.batch.OpsTest.testRecuperationEtabsEmployantSalariesCNAMJob(OpsTest.java:100) [test-classes/:na]
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.6.0_30]
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) ~[na:1.6.0_30]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) ~[na:1.6.0_30]
    at java.lang.reflect.Method.invoke(Method.java:597) ~[na:1.6.0_30]
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) [junit-4.12.jar:4.12]
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) [junit-4.12.jar:4.12]
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) [junit-4.12.jar:4.12]
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) [junit-4.12.jar:4.12]
    at org.springframework.test.context.junit4.statements.RunBeforeTestMethodCallbacks.evaluate(RunBeforeTestMethodCallbacks.java:75) [spring-test-4.2.3.RELEASE.jar:4.2.3.RELEASE]
    at org.springframework.test.context.junit4.statements.RunAfterTestMethodCallbacks.evaluate(RunAfterTestMethodCallbacks.java:86) [spring-test-4.2.3.RELEASE.jar:4.2.3.RELEASE]
    at org.springframework.test.context.junit4.statements.SpringRepeat.evaluate(SpringRepeat.java:84) [spring-test-4.2.3.RELEASE.jar:4.2.3.RELEASE]
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) [junit-4.12.jar:4.12]
    at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:254) [spring-test-4.2.3.RELEASE.jar:4.2.3.RELEASE]
    at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:89) [spring-test-4.2.3.RELEASE.jar:4.2.3.RELEASE]
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) [junit-4.12.jar:4.12]
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) [junit-4.12.jar:4.12]
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) [junit-4.12.jar:4.12]
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) [junit-4.12.jar:4.12]
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) [junit-4.12.jar:4.12]
    at org.springframework.test.context.junit4.statements.RunBeforeTestClassCallbacks.evaluate(RunBeforeTestClassCallbacks.java:61) [spring-test-4.2.3.RELEASE.jar:4.2.3.RELEASE]
    at org.springframework.test.context.junit4.statements.RunAfterTestClassCallbacks.evaluate(RunAfterTestClassCallbacks.java:70) [spring-test-4.2.3.RELEASE.jar:4.2.3.RELEASE]
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363) [junit-4.12.jar:4.12]
    at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.run(SpringJUnit4ClassRunner.java:193) [spring-test-4.2.3.RELEASE.jar:4.2.3.RELEASE]
    at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:49) [.cp/:na]
    at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) [.cp/:na]
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) [.cp/:na]
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) [.cp/:na]
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) [.cp/:na]
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) [.cp/:na]
Caused by: javax.persistence.PersistenceException: unexpected error when rollbacking
    at org.hibernate.jpa.internal.TransactionImpl.rollback(TransactionImpl.java:111) ~[hibernate-entitymanager-4.3.5.Final.jar:4.3.5.Final]
    at org.springframework.orm.jpa.JpaTransactionManager.doRollback(JpaTransactionManager.java:544) ~[spring-orm-4.2.3.RELEASE.jar:4.2.3.RELEASE]
    ... 43 common frames omitted
Caused by: org.hibernate.TransactionException: rollback failed
    at org.hibernate.engine.transaction.spi.AbstractTransactionImpl.rollback(AbstractTransactionImpl.java:217) ~[hibernate-core-4.3.6.Final.jar:4.3.6.Final]
    at org.hibernate.jpa.internal.TransactionImpl.rollback(TransactionImpl.java:108) ~[hibernate-entitymanager-4.3.5.Final.jar:4.3.5.Final]
    ... 44 common frames omitted
Caused by: org.hibernate.TransactionException: unable to rollback against JDBC connection
    at org.hibernate.engine.transaction.internal.jdbc.JdbcTransaction.doRollback(JdbcTransaction.java:167) ~[hibernate-core-4.3.6.Final.jar:4.3.6.Final]
    at org.hibernate.engine.transaction.spi.AbstractTransactionImpl.rollback(AbstractTransactionImpl.java:211) ~[hibernate-core-4.3.6.Final.jar:4.3.6.Final]
    ... 45 common frames omitted
Caused by: org.postgresql.util.PSQLException: This connection has been closed.
    at org.postgresql.jdbc2.AbstractJdbc2Connection.checkClosed(AbstractJdbc2Connection.java:837) ~[postgresql-9.3-1101.jdbc4.jar:na]
    at org.postgresql.jdbc2.AbstractJdbc2Connection.rollback(AbstractJdbc2Connection.java:854) ~[postgresql-9.3-1101.jdbc4.jar:na]
    at com.mchange.v2.c3p0.impl.NewProxyConnection.rollback(NewProxyConnection.java:860) ~[c3p0-0.9.2.1.jar:0.9.2.1]
    at org.hibernate.engine.transaction.internal.jdbc.JdbcTransaction.doRollback(JdbcTransaction.java:163) ~[hibernate-core-4.3.6.Final.jar:4.3.6.Final]
    ... 46 common frames omitted

Judging from this stacktrace, I think it goes like this: c3p0 closed the connection when the DB died, and when JTA tried to rollback, c3p0 hanged until the DB was alive again. Once it was up again, the rollback could start, but it tried to rollback on the closed connection, hence the Exception that kills the whole batch.

I thought that when the connection closes, c3p0 would get a new one, but I guess this isn't really possible on the rollback (it would try just after ?), and Spring JTA must try a rollback before it tries to reconnect.


The problem:

First, I don't know if what I'm saying is correct ... is it possible that JTA tries to rollback on the closed transaction ?

If so, I didn't found anything on how a problem during the JTA rollback could be handled, or maybe it's c3p0 ? hibernate ? or even Spring batch ?

What can I do ? (either to be sure of what's the problem is or how to solve it).

Any help appreciated. Thank you in advance (it's quite a long post ...)


Note

The crash can happen anytime, here it happens on the processor because of when I killed the DB. But I can also happen when spring batch update its job_instance_execution table, or whenever, but I think it's all managed by spring-JTA/hibernate/c3p0, so I'm looking at this rather than spring-batch.


Solution

  • Unless you are using formal distributed transactions (i.e. XAConnection), which you are not, the level at which a JDBC transaction is enforced is the Connection object. If a Connection dies with uncommitted transactional work, that work will be lost, period. There is nothing to be done about that. If you have partial work that you would like committed in the event that a Connection breaks, define your transactions in a more fine-grained way so that the work you would like to se committed is committed immediately.

    Unless you have set c3p0.unreturnedConnectionTimeout (not your issue), c3p0 will never close() Connections out from under the application. c3p0 is reporting an error from Postgres, which has noticed that the underlying Connection has been closed. Until your application (directly, or via hibernate) call close() on the Connection, it is checkout out, no matter how broken it is.

    In all that you describe, the only thing unusual is the pause before the second complaint. The probable sequence is:

    1. The Connection dies
    2. An Exception occurs when the application tries to use it
    3. A further Exception occurs when the application tries to roll it back (the first stack trace you show)
    4. Hopefully JPA/hibernate/whatever carefully calls close() on the Connection to return it to control of the Connection pool
    5. Before continuing, JPA/hibernate waits for a new, valid Connection to become available.
    6. Then it reports the prior failure to your application with an Exception.

    The ordering of steps 5 and 6 is a bit surprising. There's conceptually no reason why JPA couldn't report the failure to the application immediately. Besides that, everything is as it should be.

    "What can I do?"

    Ensure that you write your application so that it understands that JPA failures are possible, and ensures that the application is always in a consistent state despite that. On a failure, your application might retry work that has failed, or it might simply report the failure to a client (whether end user via some kind of message, or to other software via an Exception). If even after retries the failure persists, eventually you probably do have to report a failure (unless there's some way to workaround and recover from the failure without the database, which is unlikely). There's nothing an application can do to ensure that a database across the network won't simply disappear.

    At the c3p0 level, if you wish you can set testConnectionOnCheckin and idleConnectionTestPeriod to ensure prompt purging of bad Connections if database dropouts are frequent. (Your current setting, using testConnectionOnCheckout via hibernate.c3p0.validate is fine as far as correctness is concerned, but it will cause the pool replace Connections lazily, as clients demand them). See here and here for Connection testing docs.