Search code examples
hadoophadoop-yarnhadoop-streaming

submitting hadoop-streaming jobs: yarn or hadoop?


What's the difference between submitting a hadoop-streaming job using the yarn jar command and using the hadoop jar command?

This is from the current documentation:

hadoop jar hadoop-streaming-2.7.1.jar \
  -D mapreduce.job.reduces=2 \
  -input myInputDirs \
  -output myOutputDir \
  -mapper /bin/cat \
  -reducer /usr/bin/wc

But this command could be done just as well with:

yarn jar hadoop-streaming-2.7.1.jar \
  -D mapreduce.job.reduces=2 \
  -input myInputDirs \
  -output myOutputDir \
  -mapper /bin/cat \
  -reducer /usr/bin/wc

If the two commands are equivillent (as I think they are), which is preferred, and why?


Solution

  • They are equal if your MapReduce framework is YARN. If not, hadoop jar will run your jar file with MRv1 and yarn jar will run your jar by YARN(MRv2).