apache-spark hadoop-yarn hortonworks-data-platform ambari

Spark with yarn-client on HDP multi nodes cluster only starts executors on the same single node

I have installed a multi nodes HDP Cluster with Spark and Yarn on EC2

Every nodes are DataNodes.

Node3 is the only Spark Client node.

Every time I run spark jobs with yarn-client or yarn-cluster mode, it always initializes spark executors on the node3. Whereas I want the job to use every nodes.

What configs am I missing ?

I set MASTER="yarn-client" in ambari for example, but this doesn't solve the problem.

Thanks for your help.

EDIT : When I run a spark shell with 30 executors, it starts 12 executors on node3 and it takes 95% of the cluster. So my guess is that node1 and node2 aren't taken into account by yarn cluster for allocating resources like spark containers/executors.

Dunno which conf should I modify to add node1 and node2 into the cluster resources

Solution

Okey I was really dumb.

I had to add every node as Yarn NodeManager. With this, my spark jobs are well distributed on every nodes of the cluster.

Sorry this was dumb