For load balancing reasons, I want to create more partitions than reducers in a Hadoop environment. Is there a way to assign partitions to a specific reducers and if so, where can I define them. I wrote a individual Partitioner and want now to address a specific reducer with specific partitions.
Thank you in advance for the help!
Hadoop doesn't lend itself to this kind of control.
as Explained by pg 43-44 of this excellent book. The programmer has little control over:
BUT
You can change number 4 by implementing a cleverly designed custom Partitioner
that splits your data just the way you want it so that it and distributes your load across reducers as expected. Check out how they implement a custom partitioner to calculate relative frequencies in chapter 3.3.