I am newbie to Hadoop. In Hadoop 1.X, I can submit a hadoop streaming job from master node and check the result and execution time from the namenode web.
The following is the sample code for hadoop streaming in Hadoop 1.X:
$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-mapper /bin/cat \
-reducer /bin/wc
However, in Hadoop 2.x, the job tracker is removed. How can I get the same feature in Hadoop 2.X?
In Hadoop 2.0, you can view the jobs in multiple ways
1) View the jobs from ResourceManager UI ResourceMnagerhostname:8088/cluster
2) View the jobs from HUE - HUEServerHostname.com:8888/jobbrowser/
3) From command line (once the job is completed)
usage: yarn logs -applicationId [OPTIONS]
general options are: -appOwner AppOwner (assumed to be current user if not specified) -containerId ContainerId (must be specified if node address is specified) -nodeAddress NodeAddress in the format nodename:port (must be specified if container id is specified) Example: yarn logs -applicationId application_1414530900704_0005