Search code examples
amazon-web-servicesamazon-emraws-security-group

Why does AWS EMR require 2 different security groups for Master and Core/Task nodes?


I am setting up an EMR job and finding that I must specify Master and Core/Task specific security groups. What is the point of having 2? If I run in client mode - I will only utilize the Master security group anyways. And I believe if I run the EMR job on cluster mode it should only utilize the security group of core/task is this not correct?

That is at least my understanding since when I choose between client or cluster mode it tells me this:

Run your driver on a slave node (cluster mode) or on the master node as an external client (client mode).


Solution

  • As per Working With Amazon EMR-Managed Security Groups:

    The Security Group on the Master node allows:

    • Communication from Master nodes of other Amazon EMR clusters
    • Communication from the Core and Task nodes
    • Communication from the AWS cluster manager to control the cluster

    The Security Group on the Core/Task nodes allows:

    • Communication from other Core and Task nodes
    • Communication from the Master node

    Typically, the Security Group on the Master node is also opened so that you can directly connect with it (eg to run command-line Hive).

    Access to the Core/Task nodes is exclusively done via the Master node. Any submitted jobs go to the Master node, then to the Core/Task nodes.