I have installed a multi nodes HDP Cluster with Spark and Yarn on EC2
Every nodes are DataNodes.
Node3 is the only Spark Client node.
Every time I run spark jobs with yarn-client or yarn-cluster mode, it always initializes spark executors on the node3. Whereas I want the job to use every nodes.
What configs am I missing ?
I set MASTER="yarn-client" in ambari for example, but this doesn't solve the problem.
Thanks for your help.
EDIT : When I run a spark shell with 30 executors, it starts 12 executors on node3 and it takes 95% of the cluster. So my guess is that node1 and node2 aren't taken into account by yarn cluster for allocating resources like spark containers/executors.
Dunno which conf should I modify to add node1 and node2 into the cluster resources
Okey I was really dumb.
I had to add every node as Yarn NodeManager. With this, my spark jobs are well distributed on every nodes of the cluster.
Sorry this was dumb