Search code examples
hadoop-yarnemramazon-emrapache-flink

Cannot use apache flink in amazon emr


I can not a start a yarn session of Apache Flink in Amazons EMR. The error message I get is

$ tar xvfj flink-0.9.0-bin-hadoop26.tgz
$ cd flink-0.9.0
$ ./bin/yarn-session.sh -n 4 -jm 1024 -tm 4096
...
Diagnostics: File file:/home/hadoop/.flink/application_1439466798234_0008/flink-conf.yaml does not exist
java.io.FileNotFoundException: File file:/home/hadoop/.flink/application_1439466798234_0008/flink-conf.yaml does not exist
...

I am using Flink verision 0.9 and Amazons Hadoop version 4.0.0. Any ideas or hints?

The full log can be found here: https://gist.github.com/headmyshoulder/48279f06c1850c62c28c


Solution

  • From the log:

    The file system scheme is 'file'. This indicates that the specified Hadoop configuration path is wrong and the sytem is using the default Hadoop configuration values.The Flink YARN client needs to store its files in a distributed file system

    Flink failed to read the Hadoop configuration files. They are either picked up from the environment variables, e.g. HADOOP_HOME, or you can set the configuration dir in the flink-conf.yaml before you execute your YARN command.

    Flink needs to read the Hadoop configuration to know how to upload the Flink jar to the cluster file system such that the newly created YARN cluster can access it. If Flink fails to resolve the Hadoop configuration, it uses the local file system for uploading the jar. That means that the jar will be put on the machine you launch your cluster from. Thus, it won't be accessible from the Flink YARN cluster.

    Please see the Flink configuration page for more information.

    edit: On Amazong EMR, export HADOOP_CONF_DIR=/etc/hadoop/conf let's Flink discover the Hadoop configuration directory.