I have three reducers and I need each of them to receive the same key, like so:
GOOG - Reducer 0
AAPL - Reducer 1
VMW - Reducer 2
In the partitioner the getPartition() method should return an int indicating the index of the reducer one of (0,1,2).
The implementation of the getPartition() I have is:
return ((CompositeKey) key).getSymbol().hashCode() % numReduceTasks;
However this is not working here is what I get:
int numReduceTasks = 3;
System.out.println("GOOG".hashCode() % numReduceTasks);//output: 0
System.out.println("AAPL".hashCode() % numReduceTasks);//output: 1
System.out.println("VMW".hashCode() % numReduceTasks);//output: 1
So in the output files I get
.../part-r-00000
GOOG
.../part-r-00001
AAPL
VMW
.../part-r-00002
<empty>
The question is how do I fix this? i.e. how do I write a partitioner function that will guarantee same keys goes to the same reducer.
The code is working exactly as anyone should expect it to. You are using a hash code, which is random and you can't guarantee that when you %3 that they give distinct values. The only way I would see as a way to do this would be have a series of if statements that makes a deterministic decision:
if GOOG: return 0
if AAPL: return 1
if VMW: return 2
Some advice: going "outside of the box" in MapReduce is a dangerous game. The best way to use MapReduce is to play by the rules and you inherit the benefits. Sometimes it's not always possible, but you should always try!