Search code examples
amazon-web-servicesapache-flinkamazon-emrhigh-availability

Flink JobManager HA on EMR


Stack EMR: emr-6.1.0 (1 master, 4 core nodes) EMR installed apps: FLINK 1.11.0

AWS documentation says (https://docs.aws.amazon.com/emr/latest/ReleaseGuide/flink-configure.html):

Beginning with Amazon EMR version 5.28.0, JobManager high availability is also enabled automatically. No manual configuration is needed.

But when i send kill signal to Flink jobmanager yarn container -signal container_1601027657994_0003_01_000001 GRACEFUL_SHUTDOWN (same with FORCEFUL_SHUTDOWN) yarn container nothing happens. Yarn won't restart the app.

  1. Do i need to enable EMR Zookeeper as well ? (most probably yes, otherwise, I don’t understand how flink will understand from which savepoint to restart the application).
  2. Should i use a EMR cluster with 3 master nodes to have HA for Flink?

Solution

  • Yes, to have an JobManager HA you need to have an EMR with 3 master nodes, and then emr automatically adds failover configuration into flink-conf.yaml (tested with EMR 6.1.0)