Search code examples
javamultithreadingconcurrencycyclicbarrier

Why cyclicBarrier can't be acquired right after barrier action execution?


Let's consider the following piece of code:

public static void main(String[] args) throws InterruptedException {
    CyclicBarrier cb = new CyclicBarrier(3, () -> {
        logger.info("Barrier action starting");
        try {
            Thread.sleep(5000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        logger.info("Barrier action finishing");
    });
    for (int i = 0; i < 6; i++) {
        int counter = i;
        Thread.sleep(100);
        new Thread(() -> {
            try {
                logger.info("Try to acquire barrier for {}", counter);
                cb.await();
                logger.info("barrier acquired for {}", counter);

            } catch (Exception e) {
                e.printStackTrace();
            }

        }).start();
    }
}

I've created barrier with size = 3 and barrirer action which takes 5 seconds.

I see following output:

2019-10-27 15:23:09.393  INFO   --- [       Thread-0] my.playground.RemoteServiceFacade        : Try to acquire barrier for 0
2019-10-27 15:23:09.492  INFO   --- [       Thread-1] my.playground.RemoteServiceFacade        : Try to acquire barrier for 1
2019-10-27 15:23:09.593  INFO   --- [       Thread-2] my.playground.RemoteServiceFacade        : Try to acquire barrier for 2
2019-10-27 15:23:09.594  INFO   --- [       Thread-2] my.playground.RemoteServiceFacade        : Barrier action starting
2019-10-27 15:23:09.693  INFO   --- [       Thread-3] my.playground.RemoteServiceFacade        : Try to acquire barrier for 3
2019-10-27 15:23:09.794  INFO   --- [       Thread-4] my.playground.RemoteServiceFacade        : Try to acquire barrier for 4
2019-10-27 15:23:09.897  INFO   --- [       Thread-5] my.playground.RemoteServiceFacade        : Try to acquire barrier for 5
2019-10-27 15:23:14.594  INFO   --- [       Thread-2] my.playground.RemoteServiceFacade        : Barrier action finishing
2019-10-27 15:23:14.595  INFO   --- [       Thread-2] my.playground.RemoteServiceFacade        : barrier acquired for 2
2019-10-27 15:23:14.595  INFO   --- [       Thread-5] my.playground.RemoteServiceFacade        : Barrier action starting
2019-10-27 15:23:19.596  INFO   --- [       Thread-5] my.playground.RemoteServiceFacade        : Barrier action finishing
2019-10-27 15:23:19.597  INFO   --- [       Thread-0] my.playground.RemoteServiceFacade        : barrier acquired for 0
2019-10-27 15:23:19.597  INFO   --- [       Thread-4] my.playground.RemoteServiceFacade        : barrier acquired for 4
2019-10-27 15:23:19.597  INFO   --- [       Thread-3] my.playground.RemoteServiceFacade        : barrier acquired for 3
2019-10-27 15:23:19.597  INFO   --- [       Thread-1] my.playground.RemoteServiceFacade        : barrier acquired for 1
2019-10-27 15:23:19.597  INFO   --- [       Thread-5] my.playground.RemoteServiceFacade        : barrier acquired for 5

So we can see that:

  1. First barrier action lasts 15:23:09 - 15:23:14
  2. Second barrier action lasts 15:23:14 - 15:23:19

But only one thread was able to log after first barrier action termination:

2019-10-27 15:23:14.595  INFO   --- [       Thread-2] my.playground.RemoteServiceFacade        : barrier acquired for 2

I expected 3 threads should be able to acquire approximately at 15:23:14 because CyclicBarrier size is 3.

Could you explain this behaviour?

P.S.

I tried to run this code a lot of time and always the similar result.

P.S.2.

I tried to change timings a bit:

public static void main(String[] args) throws InterruptedException {
    CyclicBarrier cb = new CyclicBarrier(3, () -> {
        logger.info("Barrier action starting");
        try {
            Thread.sleep(500);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        logger.info("Barrier action finishing");
    });
    for (int i = 0; i < 6; i++) {
        int counter = i;
        Thread.sleep(1000);
        new Thread(() -> {
            try {
                logger.info("Try to acquire barrier for {}", counter);
                cb.await();
                logger.info("barrier acquired for {}", counter);

            } catch (Exception e) {
                e.printStackTrace();
            }

        }).start();
    }
}

And I see expected results:

2019-10-27 23:22:14.497  INFO   --- [       Thread-0] my.playground.RemoteServiceFacade        : Try to acquire barrier for 0
2019-10-27 23:22:15.495  INFO   --- [       Thread-1] my.playground.RemoteServiceFacade        : Try to acquire barrier for 1
2019-10-27 23:22:16.495  INFO   --- [       Thread-2] my.playground.RemoteServiceFacade        : Try to acquire barrier for 2
2019-10-27 23:22:16.496  INFO   --- [       Thread-2] my.playground.RemoteServiceFacade        : Barrier action starting
2019-10-27 23:22:16.998  INFO   --- [       Thread-2] my.playground.RemoteServiceFacade        : Barrier action finishing
2019-10-27 23:22:16.998  INFO   --- [       Thread-0] my.playground.RemoteServiceFacade        : barrier acquired for 0
2019-10-27 23:22:16.998  INFO   --- [       Thread-2] my.playground.RemoteServiceFacade        : barrier acquired for 2
2019-10-27 23:22:16.998  INFO   --- [       Thread-1] my.playground.RemoteServiceFacade        : barrier acquired for 1
2019-10-27 23:22:17.495  INFO   --- [       Thread-3] my.playground.RemoteServiceFacade        : Try to acquire barrier for 3
2019-10-27 23:22:18.495  INFO   --- [       Thread-4] my.playground.RemoteServiceFacade        : Try to acquire barrier for 4
2019-10-27 23:22:19.496  INFO   --- [       Thread-5] my.playground.RemoteServiceFacade        : Try to acquire barrier for 5
2019-10-27 23:22:19.499  INFO   --- [       Thread-5] my.playground.RemoteServiceFacade        : Barrier action starting
2019-10-27 23:22:20.002  INFO   --- [       Thread-5] my.playground.RemoteServiceFacade        : Barrier action finishing
2019-10-27 23:22:20.003  INFO   --- [       Thread-5] my.playground.RemoteServiceFacade        : barrier acquired for 5
2019-10-27 23:22:20.003  INFO   --- [       Thread-3] my.playground.RemoteServiceFacade        : barrier acquired for 3
2019-10-27 23:22:20.003  INFO   --- [       Thread-4] my.playground.RemoteServiceFacade        : barrier acquired for 4

Solution

  • A fascinating question, it's not super trivial but I'll try to explain it as succinctly as I can.

    Although multithreading doesn't guarantee any type of execution order, for this answer let's imagine there are two sequences that happen first:

    1. All threads start at exactly the same time
    2. All threads invoke barrier.await() at the same time.

    In this case you'd see an output like

    Try to acquire barrier for 0
    Try to acquire barrier for 1
    Try to acquire barrier for 2
    Try to acquire barrier for 3
    Try to acquire barrier for 4
    Try to acquire barrier for 5
    

    The current state of your 6 threads are as follows:

    1. Threads 0 and 1 await on a shared Condition since three threads have not reached the barrier "yet"

    2. Thread 2 is still runnable as the "tripping" thread for the barrier

    3. Threads 3, 4 and 5 will wait on a lock.lock invocation of the barrier (the same Lock instance in which the Condition was created.

    When the barrier sees thread 2 it previously recorded 0 and 1 as reaching the barrier so it knows that this cycle is completed and will release 0 and 1. But before it releases the other two threads, it needs to run the barrierAction you define as sleeping for 5 seconds, so it does that.

    You'd then see output

    Barrier action starting
    Barrier action finishing
    

    Thread 2 still holds the lock, is RUNNABLE and is ready to exit the barrier, so it does that and you see this output.

    barrier acquired for 2
    

    But before thread 2 exists, it will signal all other threads waiting on the current barrier. Here is where it gets tricky, the await by 0 and 1 is done on the shared Condition. The Condition is shared across all barrier "generations". So even though the first two threads were awaiting prior to the second 3 threads locking, when a signalAll is done, the first generation threads still have to wait their turn to wake up.

    At this point we have 5 threads in a BLOCKED (3, 4 & 5) or TIMED_WAITING (0, 1) state. In this example the timing is important of when they block/wait on the Lock. If they all happen in order the queue for the critical section would be:

    Thread-0 -> Thread-1 -> Thread-5 -> Thread-4 -> Thread-3 
       |                                               |
      TAIL                                            HEAD
    

    So that the next threads released will be Thread-3, then 4 and subsequently 5. The reason the queue looks like this is because all threads reach the lock at the same time and they all queue up, thread 0 and 1 obviously reached it first and so make it into the barrier's critical section, but then await while thread 2 comes in to wake them up, but now 0 and 1 get placed on the end of the queue and 3, 4 and 5 will be triggered next.

    When thread 2 leaves the barrier and signalAll threads 3 and 4 will run and since they are part of the second generation will suspend until thread 5 comes through and triggers the barrier action. This then prints out

    Barrier action starting
    Barrier action finishing
    

    Finally, thread 5 will signalAll again and the rest of the threads will wake up complete.

    In this case you will see thread 5 first complete and the rest to follow

    barrier acquired for 5
    barrier acquired for 0
    barrier acquired for 1
    barrier acquired for 3
    barrier acquired for 4