Search code examples
hadoopapache-sparkhadoop-yarnhadoop2

Simple Java based Spark program doesn't get Finished


I created a very simple "Word count" Java based Spark program, and I am running it in a cluster running on YARN with the below details:

Hadoop details:

Master Node (NN, SNN, RM) - 192.168.0.100
Slave Nodes (DN, NM) - 192.168.0.105, 192.168.0.108

Spark details:

Master running on : 192.168.0.100
Workers running on : 192.168.0.105, 192.168.0.108

I have created a client machine from where I submit the Spark job (The IP address of client machine is --> 192.168.0.240).

The below command I used to submit the Job to Spark:

spark-submit --class com.example.WordCountTask --master yarn /root/SparkCodeInJava/word-count/target/word-count-1.0-SNAPSHOT.jar /spark/input/inputText.txt /spark/output

However the program doesn't terminate at all, the data-set is very small (10 text lines) and I expect it to finis without taking much time.

The below is the output I see on console after submitting the Job:

17/03/26 19:54:42 INFO yarn.Client: Application report for application_1490572543329_0001 (state: ACCEPTED)
17/03/26 19:54:43 INFO yarn.Client: Application report for application_1490572543329_0001 (state: ACCEPTED)
17/03/26 19:54:44 INFO yarn.Client: Application report for application_1490572543329_0001 (state: ACCEPTED)
17/03/26 19:54:45 INFO yarn.Client: Application report for application_1490572543329_0001 (state: ACCEPTED)
17/03/26 19:54:46 INFO yarn.Client: Application report for application_1490572543329_0001 (state: ACCEPTED)

And this continues forever. I am not sure why this isn't getting completed.

This is what I see in GUI for this application:

enter image description here

The below is the output of: yarn logs -applicationId application_1490572543329_0002

17/03/26 20:24:09 WARN util.NativeCodeLoader: Unable to load native-hadoop libra
/tmp/logs/root/logs/application_1490572543329_0002 does not exist.

Log aggregation has not completed or is not enabled.

This is my first Spark program, and I configured to run it on YARN cluster.

I simulate the distributed environment using 4 VM's , Cent OS running on Virtualbox.

Can anyone help me why this program isn't functioning properly?

Update:

I set up the environment in AWS , with two launched instance with good configuration (8 Vcpu's and 32 GB RAM), but the job isn't still getting completed.

(A) yarn-site.xml

    <property>
            <name>yarn.nodemanager.auxservices</name>
            <value>mapreduce_shuffle</value>
    </property>

    <property>
            <name>yarn.resourcemanager.address</name>
            <value>ip-XXX-YYYY-ZZZ-AAA.us-west-2.compute.internal:8032</value>
    </property>

(B) After submitting the Job using spark-submit, I see this in the output which is displayed on console:

17/03/29 15:51:35 INFO yarn.Client: Requesting a new application from cluster with **0 NodeManagers**

Has this to do anything with the Job not getting finished?


Solution

  • From the ERROR messages,

    YARN Application State: ACCEPTED, waiting for AM container to be allocated

    17/03/29 15:51:35 INFO yarn.Client: Requesting a new application from cluster with **0 NodeManagers**
    

    YARN is unable to allocate containers for the Spark application as there are no active NodeManager(s) available.

    Nodemanagers use the property yarn.resourcemanager.resource-tracker.address to communicate with ResourceManager. By default, the value of this property is set as

    <property>
       <name>yarn.resourcemanager.resource-tracker.address</name>
       <value>${yarn.resourcemanager.hostname}:8031</value>
    </property> 
    

    The reference property yarn.resourcemanager.hostname defaults to 0.0.0.0. Nodemanagers will not be able to communicate with the RM unless the RM hostname is defined properly.

    Modify this property in yarn-site.xml for all the nodes

     <property>
       <name>yarn.resourcemanager.hostname</name>
       <value>rm_hostname</value> <!-- Hostname of the node where Resource Manager is started -->
    </property> 
    

    Also, the property yarn.nodemanager.auxservices must be yarn.nodemanager.aux-services.

    Restart the services after the changes.