Search code examples
hadoopmapreducehadoop2hadoop-partitioning

Multiple reducers without running partitioner in MapReducer


I am trying to understand the concept of running multiple reducers in MR job and came to know that it is partitioner which decides which (key,value) pairs goes to which reducer.

My question is:

Can we run multiple reducers without running partitioner? Would that be a valid scenario?


Solution

  • Think partitioner as the entity which decides on which reducer(bucket) is going to process a particular key-value (element) output of a mapper.

    The default partitioner uses a hash function of key to divide the elements across reducer. An analogy is how core java map collection uses hash function to decided bucket (reducer) for the element (key-value).

    In this process, it guarantee that the same key is sent to a single reducer (which process the all the values of the key). So, if mapper emits m unique key (each key can have any count) and there are n reducer, partitioner tries to distribute keys such that each reducer gets m/n unique keys along with a list of values associated with the key.

    Note that, it is possible to set the number of reducer in the program. It means you are saying the partitioner to restrict number of buckets available to distribute the keys.