Search code examples
apache-sparkclouderagateway

Is there any general rule when assigning role to gateway in Cloudera Spark2?


I am planning upgrade the existing Spark 1.6 to 2.1 in Cloudera, I was advised that I should assign gateway role to all Node Manager and Resource Manager nodes. Current gateway role is assigned to a proxy node, which is not included in the planned Spark2, the reason is that proxy node has too many (20+) roles, I wonder if anyone can give any suggestion here? I checked Cloudera doc, I don't see a guideline on it (or maybe I missed it?)

Thanks lots.


Solution

  • I have a slight disagreement with the other answer, which says

    By default any host running a service will have the config files included so you don't need to add a gateway role to your Node Manager and Resource Manager roles

    Just having Node Manager and Resource Manager running on a node will only give you the configuration files for YARN, not Spark2. That being said, you only need to deploy Spark gateway role to your edge node, where you allow end user to login and run command line tool such as beeline, hdfs command and spark-shell/spark-submit. No one should be allowed to login your Node Manager/Datanode, as a security policy.

    In your case, it looks like what you call proxy node. The gateway is just configuration files and is not a running process. So I don't think you need to be concerned about too many existing roles.