Search code examples
hadoop

What runs first: the partitioner or the combiner?


I was wondering between partitioner and combiner, which runs first?

I was of the opinion it is the partitiner first and then combiner and then the keys are redirected to different reducers, which appears like the partitioner, and so I'm confused. Please help me understand.


Solution

  • The direct answer to your question is => COMBINER

    Details: Combiner can be viewed as mini-reducers in the map phase. They perform a local-reduce on the mapper results before they are distributed further. Once the Combiner functionality is executed, it is then passed on to the Reducer for further work.

    where as

    The partitioner comes into the picture when we are working one more than on reducer. So, the partitioner decides which reducer is responsible for a particular key. They basically take the Mapper Result(if Combiner is used then Combiner Result) and send it to the responsible Reducer based on the key.

    For a better understanding you can refer the following image, which I have taken from Yahoo Developer Tutorial on Hadoop. Figure 4.6: Combiner step inserted into the MapReduce data flow
    (source: flickr.com)

    Here is the tutorial .