Search code examples
hadoophadoop-partitioning

hadoop distribute partitions to reducer


For load balancing reasons, I want to create more partitions than reducers in a Hadoop environment. Is there a way to assign partitions to a specific reducers and if so, where can I define them. I wrote a individual Partitioner and want now to address a specific reducer with specific partitions.

Thank you in advance for the help!


Solution

  • Hadoop doesn't lend itself to this kind of control.

    as Explained by pg 43-44 of this excellent book. The programmer has little control over:

    1. Where a mapper or reducer runs (i.e., on which node in the cluster).
    2. When a mapper or reducer begins or finishes.
    3. Which input key-value pairs are processed by a specific mapper.
    4. Which intermediate key-value pairs are processed by a specific reducer. (what you would like)

    BUT

    You can change number 4 by implementing a cleverly designed custom Partitioner that splits your data just the way you want it so that it and distributes your load across reducers as expected. Check out how they implement a custom partitioner to calculate relative frequencies in chapter 3.3.