Search code examples
apache-flinkamazon-emrflink-streaming

Flink on AWS EMR Task Nodes


Is it possible to run Flink task managers on the Task nodes of AWS EMR? If yes, how different is it from running Task Managers on a core node?


Solution

  • Yes, you should be able to run TMs on task nodes. The only difference I'd expect is that EMR won't schedule the Flink Job Manager (JM) on a task node ("Amazon EMR ... allows application master processes to run only on core nodes").

    If your workflow has sources that read from HDFS and/or sinks that write to HDFS, then subtasks of these operators running on task nodes might take longer, as task nodes don't run the Hadoop Data Node daemon, and thus all reads/writes are over the network.