In the map phase of my program, I need to know the total number of mappers that are created. This will help me in the key creation process of the map (I want to emit as many key-value pairs for each object as the number of mappers).
I know that setting the number of mappers is just a hint, but what is the way to get the actual number of mappers. I tried the following in the configure method of my Mapper:
public void configure(JobConf conf) {
System.out.println("map tasks: "+conf.get("mapred.map.tasks"));
System.out.println("tipid: "+conf.get("mapred.tip.id"));
System.out.println("taskpartition: "+conf.get("mapred.task.partition"));
}
But I get the results:
map tasks: 1
tipid: task_local1204340194_0001_m_000000
taskpartition: 0
map tasks: 1
tipid: task_local1204340194_0001_m_000001
taskpartition: 1
which means (?) that there are two map tasks, and not just one, as printed (which is quite natural, since I have two small input files). Shouldn't the number after map tasks be 2?
For now, I just count the number of files in the input folder, but this is not a good solution, since a file could be larger than the block size and result in more than one input splits and hence mappers. Any suggestions?
Finally, it seems that conf.get("mapred.map.tasks"))
DOES work after all, when I generate an executable jar file and run my program in the cluster/locally. Now the output of "map tasks" is correct.
It did not work only when running my mapreduce program locally on hadoop from the eclipse-plugin. Maybe it is an eclipse-plugin's issue.
I hope this will help someone else having the same issue. Thank you for your answers!