Search code examples
apache-flink

How does flink deal with the opened transaction when failure happens


I am using Flink 1.12.0, and I got a question about Flink 2PC mechanism for end-to-end consistency guarantee.

At the start of checkpoint, a transaction is opened, and the transaction is committed after the successful completion of checkpoint.

Then what happens if failure happens? I think the opened transaction should be rolled back? also when the transaction is rolled back? Thanks.


Solution

  • Because the operators and Task Managers are distributed inside a cluster, Flink has to ensure that all components agree together in order to claim that a commit is successful. Flink uses the 2 phase commit protocol, as you said, and with a pre-commit. The pre-commit is the key to deal with failures during the checkpoint, as it says on the documentation

    The pre-commit phase finishes when the checkpoint barrier passes through all of the operators and the triggered snapshot callbacks complete.