I am using Flink 1.12.0, and I got a question about Flink 2PC mechanism for end-to-end consistency guarantee.
At the start of checkpoint, a transaction is opened, and the transaction is committed after the successful completion of checkpoint.
Then what happens if failure happens? I think the opened transaction should be rolled back? also when the transaction is rolled back? Thanks.
Because the operators and Task Managers are distributed inside a cluster, Flink has to ensure that all components agree together in order to claim that a commit is successful. Flink uses the 2 phase commit protocol, as you said, and with a pre-commit. The pre-commit is the key to deal with failures during the checkpoint, as it says on the documentation
The pre-commit phase finishes when the checkpoint barrier passes through all of the operators and the triggered snapshot callbacks complete.