Search code examples
hadoophadoop-yarnapache-flinkamazon-emr

Configure Flink Rest API on Amazon EMR


I'm running a Flink app via YARN on Amazon's EMR, with one master and one slave.

I'm trying to ssh into the master node and then access the Flink REST API, but can't get EMR to use the same host/port statically.

I've tried adding this configuration to EMR and fetching the host from the private DNS of the current master node. The actual port that it is being run on is different with each yarn-session.

 [
  {
    "Classification": "flink-conf",
    "Properties": {
      "rest.port": "44477",
      "jobmanager.web.port": "44477",
      "jobmanager.web.upload.dir": "/home/hadoop"
    }
  }
]

I've verified that the properties are reflected in the flink-conf.yaml file as well.

Here is an excerpt from the startup log:

2018-09-06 21:34:33,749 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: env.yarn.conf.dir, /etc/hadoop/conf
2018-09-06 21:34:33,751 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: env.hadoop.conf.dir, /etc/hadoop/conf
2018-09-06 21:34:33,751 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: rest.port, 44477
2018-09-06 21:34:33,751 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.web.port, 44477

Flink JobManager is now running on ip-10-2-3-25.ec2.internal:41161 with leader id 00000000-0000-0000-0000-000000000000.
JobManager Web Interface: http://ip-10-2-3-25.ec2.internal:41161

Solution

  • I emailed the Flink mailing list with this as well, and found that this was a result of using YARN, which I have set up for EMR. YARN ignores the Flink configuration as the variables are set at runtime and are accessible through the cli command yarn application -status [appId], where [appId] is the YARN app's id, which can be found with yarn application -list.