Search code examples
springmultithreadingspring-data-jpaspring-transactionsspring-integration-jdbc

Spring integration jdbc distributed lock: multiple threads and transactions


To reproduce the issue:

docker run -e POSTGRES_USER=localtest -e POSTGRES_PASSWORD=localtest -e POSTGRES_DB=orders -p 5432:5432 -d postgres:15.3

Then run this code:

https://github.com/cuipengfei/Spikes/tree/master/jpa/lock-transaction-threads

@Controller
public class TestLockController {
    private Logger logger = LoggerFactory.getLogger(TestLockController.class);

    @Autowired
    private TestLockService service;

    @GetMapping("test-lock")
    public ResponseEntity<String> testLock() {
        logger.info("start");

        Arrays.asList("a", "b", "c").parallelStream().forEach(key -> {
            logger.info("going to call test lock method with key: {}", key);
            service.testLock(key);
        });

        logger.info("end");

        return ResponseEntity.ok("done");
    }
}

When the /test-lock api is called, it will use parallel stream to start new threads to run the testLock method 3 time.

@Service
public class TestLockService {
    private Logger logger = LoggerFactory.getLogger(TestLockService.class);

    @Autowired
    private JdbcLockRegistry jdbcLockRegistry;

    @Transactional
    public void testLock(String key) {
        logger.info("{} start with transaction {}", key,
                TransactionAspectSupport.currentTransactionStatus().hashCode());

        boolean isLocked = false;
        Lock lock = jdbcLockRegistry.obtain(key);
        try {
            pretendToDoWork(key);
            // ↑ comment this out to see errors

            isLocked = lock.tryLock(10, TimeUnit.SECONDS);
            if (isLocked) {
                logger.info("{} lock ok", key);
            } else {
                logger.info("{} lock failed", key);
            }
        } catch (Throwable t) {
            logger.error("{} lock failed with exception: {}", key, t.getMessage());
        } finally {
            if (isLocked) {
                logger.info("{} unlock", key);
                lock.unlock();
            } else {
                logger.info("{} skip unlock since it was never locked", key);
            }
        }

        logger.info("{} end with transaction {}", key,
                TransactionAspectSupport.currentTransactionStatus().hashCode());
    }

    private void pretendToDoWork(String key) throws InterruptedException {
        Random random = new Random();
        int randomNumber = random.nextInt(3000);
        logger.info("{} pretend to work for {} before try to lock", key, randomNumber);
        Thread.sleep(randomNumber);
    }
}

Then in the testLock method of the service, it will try to get lock.

When I trigger this API, it prints the following log:

2023-12-22 21:49:26.188  INFO 20180 --- [nio-8080-exec-1] c.g.s.e.controllers.TestLockController   : start
2023-12-22 21:49:26.189  INFO 20180 --- [nio-8080-exec-1] c.g.s.e.controllers.TestLockController   : going to call test lock method with key: b
2023-12-22 21:49:26.189  INFO 20180 --- [nPool-worker-37] c.g.s.e.controllers.TestLockController   : going to call test lock method with key: c
2023-12-22 21:49:26.190  INFO 20180 --- [nPool-worker-51] c.g.s.e.controllers.TestLockController   : going to call test lock method with key: a
2023-12-22 21:49:26.208  INFO 20180 --- [nPool-worker-51] c.g.s.example.service.TestLockService    : a start with transaction 1915798295
2023-12-22 21:49:26.208  INFO 20180 --- [nio-8080-exec-1] c.g.s.example.service.TestLockService    : b start with transaction 353593802
2023-12-22 21:49:26.208  INFO 20180 --- [nPool-worker-37] c.g.s.example.service.TestLockService    : c start with transaction 1079367566
2023-12-22 21:49:26.209  INFO 20180 --- [nPool-worker-51] c.g.s.example.service.TestLockService    : a pretend to work for 2609 before try to lock
2023-12-22 21:49:26.209  INFO 20180 --- [nio-8080-exec-1] c.g.s.example.service.TestLockService    : b pretend to work for 2217 before try to lock
2023-12-22 21:49:26.209  INFO 20180 --- [nPool-worker-37] c.g.s.example.service.TestLockService    : c pretend to work for 918 before try to lock
2023-12-22 21:49:27.174  INFO 20180 --- [nPool-worker-37] c.g.s.example.service.TestLockService    : c lock ok
2023-12-22 21:49:27.174  INFO 20180 --- [nPool-worker-37] c.g.s.example.service.TestLockService    : c unlock
2023-12-22 21:49:27.182  INFO 20180 --- [nPool-worker-37] c.g.s.example.service.TestLockService    : c end with transaction 1079367566
2023-12-22 21:49:28.444  INFO 20180 --- [nio-8080-exec-1] c.g.s.example.service.TestLockService    : b lock ok
2023-12-22 21:49:28.444  INFO 20180 --- [nio-8080-exec-1] c.g.s.example.service.TestLockService    : b unlock
2023-12-22 21:49:28.449  INFO 20180 --- [nio-8080-exec-1] c.g.s.example.service.TestLockService    : b end with transaction 353593802
2023-12-22 21:49:28.834  INFO 20180 --- [nPool-worker-51] c.g.s.example.service.TestLockService    : a lock ok
2023-12-22 21:49:28.834  INFO 20180 --- [nPool-worker-51] c.g.s.example.service.TestLockService    : a unlock
2023-12-22 21:49:28.838  INFO 20180 --- [nPool-worker-51] c.g.s.example.service.TestLockService    : a end with transaction 1915798295
2023-12-22 21:49:28.838  INFO 20180 --- [nio-8080-exec-1] c.g.s.e.controllers.TestLockController   : end

Then if I comment out this line:

pretendToDoWork(key);
// ↑ Comment this out to see errors!

Then run it again, it will print the following logs:

2023-12-22 21:36:30.192  INFO 39588 --- [nio-8080-exec-1] c.g.s.e.controllers.TestLockController   : start
2023-12-22 21:36:30.192  INFO 39588 --- [nio-8080-exec-1] c.g.s.e.controllers.TestLockController   : going to call test lock method with key: b
2023-12-22 21:36:30.193  INFO 39588 --- [nPool-worker-51] c.g.s.e.controllers.TestLockController   : going to call test lock method with key: a
2023-12-22 21:36:30.193  INFO 39588 --- [nPool-worker-37] c.g.s.e.controllers.TestLockController   : going to call test lock method with key: c
2023-12-22 21:36:30.210  INFO 39588 --- [nPool-worker-37] c.g.s.example.service.TestLockService    : c start with transaction 344273392
2023-12-22 21:36:30.211  INFO 39588 --- [nio-8080-exec-1] c.g.s.example.service.TestLockService    : b start with transaction 2109875677
2023-12-22 21:36:30.210  INFO 39588 --- [nPool-worker-51] c.g.s.example.service.TestLockService    : a start with transaction 1083949093
2023-12-22 21:36:30.241  INFO 39588 --- [nio-8080-exec-1] c.g.s.example.service.TestLockService    : b lock ok
2023-12-22 21:36:30.241  INFO 39588 --- [nio-8080-exec-1] c.g.s.example.service.TestLockService    : b unlock
2023-12-22 21:36:30.247  INFO 39588 --- [nio-8080-exec-1] c.g.s.example.service.TestLockService    : b end with transaction 2109875677
2023-12-22 21:36:30.247 ERROR 39588 --- [nPool-worker-51] c.g.s.example.service.TestLockService    : a lock failed with exception: Failed to lock mutex at 0cc175b9-c0f1-36a8-b1c3-99e269772661; nested exception is org.springframework.orm.jpa.JpaSystemException: Unable to commit against JDBC Connection; nested exception is org.hibernate.TransactionException: Unable to commit against JDBC Connection
2023-12-22 21:36:30.247 ERROR 39588 --- [nPool-worker-37] c.g.s.example.service.TestLockService    : c lock failed with exception: Failed to lock mutex at 4a8a08f0-9d37-3737-9564-9038408b5f33; nested exception is org.springframework.orm.jpa.JpaSystemException: Unable to commit against JDBC Connection; nested exception is org.hibernate.TransactionException: Unable to commit against JDBC Connection
2023-12-22 21:36:30.247  INFO 39588 --- [nPool-worker-51] c.g.s.example.service.TestLockService    : a skip unlock since it was never locked
2023-12-22 21:36:30.247  INFO 39588 --- [nPool-worker-37] c.g.s.example.service.TestLockService    : c skip unlock since it was never locked
2023-12-22 21:36:30.247  INFO 39588 --- [nPool-worker-51] c.g.s.example.service.TestLockService    : a end with transaction 1083949093
2023-12-22 21:36:30.247  INFO 39588 --- [nPool-worker-37] c.g.s.example.service.TestLockService    : c end with transaction 344273392
2023-12-22 21:36:30.247  INFO 39588 --- [nio-8080-exec-1] c.g.s.e.controllers.TestLockController   : end

Now I get errors.

Question is: there are 3 threads, each using a different transaction, each trying to get lock with a different key. They seem independent from each other. Why are they conflicting with each other when the pretendToDoWork method is removed?

And if the above code is the wrong way to do it, what is the proper way to use the jdbc distributed lock in multiple threads using multiple transactions?

-----update 2023-12-25-----

code updated: https://github.com/cuipengfei/Spikes/tree/master/jpa/lock-transaction-threads/src/main/java/com/github/spring/example/service

  • Problem1Service.java: the issue described above

  • Problem1FixService.java: introduce a custom lock repo which uses its own separate trans manager

  • Problem2Service.java: based on problem 1 fix, run some sql before try to get lock, will get transaction isolation level change error due to running biz sql and lock in same transaction

  • Problem2BadFixService.java: try to fix problem 2 by making parallel stream not to use current thread, however this fix resutls in running biz sql outside of a trans

  • Problem2GoodFixService.java: try to fix problem 2 by narrowing down scope of trans and not mixing biz sql and lock in same trans

Does Problem2GoodFixService look like a proper way to use jdbc lock with multi-threads and @Transactional?


Solution

  • Why are they conflicting with each other when the pretendToDoWork method is removed?

    First of all, while the pretendToDoWork(key); method is out of comment line; lock accesses occur at different times, but when the method is commented out: lock accesses occur simultaneously and you get an error.

    Debugging

    To understand what's going on here; you can add t.printStackTrace(); to the catch block, or you can activate DEBUG mode by adding logging.level.org.springframework.orm.jpa=DEBUG to your application.properties file.

    The error you should pay attention to here is org.postgresql.util.PSQLException: ERROR: could not serialize access due to read/write dependencies among transactions.

    org.springframework.orm.jpa.JpaSystemException: Unable to commit against JDBC Connection; nested exception is org.hibernate.TransactionException: Unable to commit against JDBC Connection
        at org.springframework.orm.jpa.vendor.HibernateJpaDialect.convertHibernateAccessException(HibernateJpaDialect.java:331) ~[spring-orm-5.3.28.jar:5.3.28]
    ...
    org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) ~[tomcat-embed-core-9.0.76.jar:9.0.76]
        at java.base/java.lang.Thread.run(Thread.java:833) ~[na:na]
    Caused by: org.hibernate.TransactionException: Unable to commit against JDBC Connection
        at org.hibernate.resource.jdbc.internal.AbstractLogicalConnectionImplementor.commit(AbstractLogicalConnectionImplementor.java:92) ~[hibernate-core-5.6.15.Final.jar:5.6.15.Final]
        at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl$TransactionDriverControlImpl.commit(JdbcResourceLocalTransactionCoordinatorImpl.java:282) ~[hibernate-core-5.6.15.Final.jar:5.6.15.Final]
        at org.hibernate.engine.transaction.internal.TransactionImpl.commit(TransactionImpl.java:101) ~[hibernate-core-5.6.15.Final.jar:5.6.15.Final]
        at org.springframework.orm.jpa.JpaTransactionManager.doCommit(JpaTransactionManager.java:562) ~[spring-orm-5.3.28.jar:5.3.28]
        ... 89 common frames omitted
    Caused by: org.postgresql.util.PSQLException: ERROR: could not serialize access due to read/write dependencies among transactions
      Detail: Reason code: Canceled on identification as a pivot, during commit attempt.
      Hint: The transaction might succeed if retried.
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2676) ~[postgresql-42.5.4.jar:42.5.4]
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2366) ~[postgresql-42.5.4.jar:42.5.4]
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:356) ~[postgresql-42.5.4.jar:42.5.4]
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:316) ~[postgresql-42.5.4.jar:42.5.4]
        at org.postgresql.jdbc.PgConnection.executeTransactionCommand(PgConnection.java:928) ~[postgresql-42.5.4.jar:42.5.4]
        at org.postgresql.jdbc.PgConnection.commit(PgConnection.java:950) ~[postgresql-42.5.4.jar:42.5.4]
        at com.zaxxer.hikari.pool.ProxyConnection.commit(ProxyConnection.java:387) ~[HikariCP-4.0.3.jar:na]
        at com.zaxxer.hikari.pool.HikariProxyConnection.commit(HikariProxyConnection.java) ~[HikariCP-4.0.3.jar:na]
        at org.hibernate.resource.jdbc.internal.AbstractLogicalConnectionImplementor.commit(AbstractLogicalConnectionImplementor.java:86) ~[hibernate-core-5.6.15.Final.jar:5.6.15.Final]
        ... 92 common frames omitted
    

    Solutions

    https://github.com/spring-projects/spring-integration/issues/3733 there are explanations about the error and why it occurs.

    The problem was solved a few versions later with PR https://github.com/spring-projects/spring-integration/pull/3782. (Example; you can upgrade spring-integration-jdbc in your pom.xml file to 6.0.9 or upgrade your Spring Boot version.)

    If you want to solve the problem manually in the current version; the solution in the title: How to retry a PostgreSQL serializable transaction with Spring? will work.