Search code examples

Simple Java based Spark program doesn't get Finished

I created a very simple "Word count" Java based Spark program, and I am running it in a cluster running on YARN with the below details:

Hadoop details:

Master Node (NN, SNN, RM) -
Slave Nodes (DN, NM) -,

Spark details:

Master running on :
Workers running on :,

I have created a client machine from where I submit the Spark job (The IP address of client machine is -->

The below command I used to submit the Job to Spark:

spark-submit --class com.example.WordCountTask --master yarn /root/SparkCodeInJava/word-count/target/word-count-1.0-SNAPSHOT.jar /spark/input/inputText.txt /spark/output

However the program doesn't terminate at all, the data-set is very small (10 text lines) and I expect it to finis without taking much time.

The below is the output I see on console after submitting the Job:

17/03/26 19:54:42 INFO yarn.Client: Application report for application_1490572543329_0001 (state: ACCEPTED)
17/03/26 19:54:43 INFO yarn.Client: Application report for application_1490572543329_0001 (state: ACCEPTED)
17/03/26 19:54:44 INFO yarn.Client: Application report for application_1490572543329_0001 (state: ACCEPTED)
17/03/26 19:54:45 INFO yarn.Client: Application report for application_1490572543329_0001 (state: ACCEPTED)
17/03/26 19:54:46 INFO yarn.Client: Application report for application_1490572543329_0001 (state: ACCEPTED)

And this continues forever. I am not sure why this isn't getting completed.

This is what I see in GUI for this application:

enter image description here

The below is the output of: yarn logs -applicationId application_1490572543329_0002

17/03/26 20:24:09 WARN util.NativeCodeLoader: Unable to load native-hadoop libra
/tmp/logs/root/logs/application_1490572543329_0002 does not exist.

Log aggregation has not completed or is not enabled.

This is my first Spark program, and I configured to run it on YARN cluster.

I simulate the distributed environment using 4 VM's , Cent OS running on Virtualbox.

Can anyone help me why this program isn't functioning properly?


I set up the environment in AWS , with two launched instance with good configuration (8 Vcpu's and 32 GB RAM), but the job isn't still getting completed.

(A) yarn-site.xml



(B) After submitting the Job using spark-submit, I see this in the output which is displayed on console:

17/03/29 15:51:35 INFO yarn.Client: Requesting a new application from cluster with **0 NodeManagers**

Has this to do anything with the Job not getting finished?


  • From the ERROR messages,

    YARN Application State: ACCEPTED, waiting for AM container to be allocated

    17/03/29 15:51:35 INFO yarn.Client: Requesting a new application from cluster with **0 NodeManagers**

    YARN is unable to allocate containers for the Spark application as there are no active NodeManager(s) available.

    Nodemanagers use the property yarn.resourcemanager.resource-tracker.address to communicate with ResourceManager. By default, the value of this property is set as


    The reference property yarn.resourcemanager.hostname defaults to Nodemanagers will not be able to communicate with the RM unless the RM hostname is defined properly.

    Modify this property in yarn-site.xml for all the nodes

       <value>rm_hostname</value> <!-- Hostname of the node where Resource Manager is started -->

    Also, the property yarn.nodemanager.auxservices must be yarn.nodemanager.aux-services.

    Restart the services after the changes.