Search code examples
hadoophadoop2mrv2

How to submit a Hadoop streaming job and check execution history with Hadoop 2.x


I am newbie to Hadoop. In Hadoop 1.X, I can submit a hadoop streaming job from master node and check the result and execution time from the namenode web.

The following is the sample code for hadoop streaming in Hadoop 1.X:

$HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-mapper /bin/cat \
-reducer /bin/wc

However, in Hadoop 2.x, the job tracker is removed. How can I get the same feature in Hadoop 2.X?


Solution

  • In Hadoop 2.0, you can view the jobs in multiple ways

    1) View the jobs from ResourceManager UI ResourceMnagerhostname:8088/cluster
    2) View the jobs from HUE - HUEServerHostname.com:8888/jobbrowser/
    3) From command line (once the job is completed)

    usage: yarn logs -applicationId [OPTIONS]

    general options are: -appOwner AppOwner (assumed to be current user if not specified) -containerId ContainerId (must be specified if node address is specified) -nodeAddress NodeAddress in the format nodename:port (must be specified if container id is specified) Example: yarn logs -applicationId application_1414530900704_0005