Search code examples
hadoopmapreducemappers

hadoop get actual number of mappers


In the map phase of my program, I need to know the total number of mappers that are created. This will help me in the key creation process of the map (I want to emit as many key-value pairs for each object as the number of mappers).

I know that setting the number of mappers is just a hint, but what is the way to get the actual number of mappers. I tried the following in the configure method of my Mapper:

public void configure(JobConf conf) {
    System.out.println("map tasks: "+conf.get("mapred.map.tasks"));
    System.out.println("tipid: "+conf.get("mapred.tip.id"));
    System.out.println("taskpartition: "+conf.get("mapred.task.partition"));
}

But I get the results:

map tasks: 1
tipid: task_local1204340194_0001_m_000000
taskpartition: 0
map tasks: 1
tipid: task_local1204340194_0001_m_000001
taskpartition: 1

which means (?) that there are two map tasks, and not just one, as printed (which is quite natural, since I have two small input files). Shouldn't the number after map tasks be 2?

For now, I just count the number of files in the input folder, but this is not a good solution, since a file could be larger than the block size and result in more than one input splits and hence mappers. Any suggestions?


Solution

  • Finally, it seems that conf.get("mapred.map.tasks")) DOES work after all, when I generate an executable jar file and run my program in the cluster/locally. Now the output of "map tasks" is correct.

    It did not work only when running my mapreduce program locally on hadoop from the eclipse-plugin. Maybe it is an eclipse-plugin's issue.

    I hope this will help someone else having the same issue. Thank you for your answers!