I am trying to understand partitioning
in MapReduce
and I came to know that Hadoop has a default partitioner known as HashPartitioner
, and partitioner helps in deciding to which reducer a given key would go to.
hashcode(key) % NumberOfReducers, where `key` is the key in <key,value> pair.
How does HashPartitioner
calculate the hash-code for the key? Does is simply call the hashCode() of the key or does this HashPartitioner
use some other logic to calculate the hash-code of the key?
Can anyone help me understand this?
The default partitioner simply use hashcode()
method of key and calculate the partition. This give you an opportunity to implement your hascode()
to tweak the way the keys are going to be partitioned.
From the javadoc:
public int getPartition(K key,
V value,
int numReduceTasks)
Use Object.hashCode() to partition.
For the actual code, it simply returns (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks
:
public int More ...getPartition(K key, V value,
int numReduceTasks) {
return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
}
EDIT: Adding detail on custom partitioner
You can add a different logic for partitioner, which even may not use hashcode()
at all.
As custom partitioner can be written by extending Partitioner
public class CustomPartitioner extends Partitioner<Text, Text>
One such example, which works on the properties of the custom key object:
public static class CustomPartitioner extends Partitioner<Text, Text>{
@Override
public int getPartition(Text key, Text value, int numReduceTasks){
String emp_dept = key.getDepartment();
if(numReduceTasks == 0){
return 0;
}
if(key.equals(new Text(“IT”))){
return 0;
}else if(key.equals(new Text(“Admin”))){
return 1 % numReduceTasks;
}else{
return 2 % numReduceTasks;
}
}