Search code examples
hadoopmapreduceoozieoozie-coordinator

Map-reduce via Oozie


If I am using Oozie to run MapReduce job, is there a specific number about how many mappers will be started? Is it:

  1. one for Oozie and one for map-reduce job or
  2. one for Oozie and one mapper for every 64MB block(default block size)

Solution

  • The above answers focus on how many maps and reduces a mapreduce job needs. However as you specifically ask about oozie, I will share my experience on mapreduce (in pig) via Oozie.

    Explanation

    When you kick off an oozie workflow, you need 1 yarn application for this. I am not sure what the logic is, but it appears that these applications usually require 1 map, and occasionally 2.

    Besides the above, you still need the same amount of mappers and reducers to do the actual work as if you did not use oozie. (If you see a different number than you expected, this may be because you passed specific parameters on map or reduce properties when calling the script).

    Warning

    The above means, that if you were to have 100 available containers, and kickoff 100 workflows (for example by starting a daily job with a startdate of 100 days in the past), it is likely that the workflows take up all available containers, and the actual work is suspended indefinitely.