Search code examples
hadoopmapreducecombiners

How can I find out if a task is a reducer or a combiner during run time in Hadoop?


If the operation performed with MapReduce is not commutative and associative, then the combiner cannot be the same as the reducer.

For example when calculating an average value the combiners sums the values for a key and the reducer sums then and then divides the sum by the total number of values for that key. The code of the combiner has only a slight modification. What if you could use the same class for both combiner and reducer and have a peace of code that can determine if the current task is a combiner or a reducer? If it finds out that it is a reducer than it divides the sum by the count.

Something like this:

protected void reduce(Text keyIn, Iterable<PairWritable> valuesIn,
      Context context)
  throws IOException, InterruptedException {
    double sum = 0.0d;
    long count = 0l;

    for (PairWritable valueIn : valuesIn) {
      sum += valueIn.getSum();
      count += valueIn.getCount();
    }

    if (THIS_IS_A_REDUCER) {
      sum /= count;
    }

    context.write(keyIn, new PairWritable(sum, count));
  }

Is it possible to do this? Can the peace of code THIS_IS_A_REDUCER from above be replaced with something?

I can determine if a task is a mapper or a reducer from task attempt ID String, but both combiners and reducers seem to have similar string patterns.


Solution

  • I suppose you could interrogate the Context object and get the task ID. Then, once you have the ID, the mapper (including the combiner) will have a "m" in the name, while a reducer will have a "r" in the name.

    To get the task attempt ID, use .getTaskAttemptID(). I think you should be able to do context.getTaskAttemptID() to use this, but I can't test it to be sure.