Search code examples
hadoopmapreducehadoop2bigdata

Is a tasktracker corresponding to a mapper or a reducer in hadoop?


I know that a mapper always performs couples of map operations and a reducer always performs couples of reduce operations. In another word, the mapping between mapper(reducer) and map(reduce) operation is one to many.
Now I have a question, is the mapping between tasktracker and mapper one-to-one or one-to-many?


Solution

  • First of all i will explain to you exactly what a Task tracker is:

    A TaskTracker is a node in the cluster that accepts tasks - Map, Reduce and Shuffle operations - from a JobTracker.

    Every TaskTracker is configured with a set of slots, these indicate the number of tasks that it can accept. When the JobTracker tries to find somewhere to schedule a task within the MapReduce operations, it first looks for an empty slot on the same server that hosts the DataNode containing the data, and if not, it looks for an empty slot on a machine in the same rack.

    The TaskTracker spawns a separate JVM processes to do the actual work; this is to ensure that process failure does not take down the task tracker. The TaskTracker monitors these spawned processes, capturing the output and exit codes. When the process finishes, successfully or not, the tracker notifies the JobTracker. The TaskTrackers also send out heartbeat messages to the JobTracker, usually every few minutes, to reassure the JobTracker that it is still alive. These message also inform the JobTracker of the number of available slots, so the JobTracker can stay up to date with where in the cluster work can be delegated.

    and yes this leads us to a point that one task tracker do many operations with job tracker (actual jobs i.e, map reduce tasks) , so answer to your question would be

    one (job tracker) to many (task tracker) relation