Search code examples
javaapachehadoopgiraph

Why is Speculative execution doesn't make sense for Giraph?


recently I am running some benchmarks to learn about failover mechanism in Giraph.

Actually I'm curious; when a worker in a job gets slower, the other workers will just wait for it. Later I found something like this in GiraphJob.java:

// Speculative execution doesn't make sense for Giraph
giraphConfiguration.setBoolean("mapred.map.tasks.speculative.execution", false);

Does anyone know why speculative execution is not enabled in Giraph?

Thanks


Solution

  • At first lets bring back in our mind what speculative execution is. Quoted from Yahoo's Hadoop tutorial:

    Speculative execution: One problem with the Hadoop system is that by dividing the tasks across many nodes, it is possible for a few slow nodes to rate-limit the rest of the program. For example if one node has a slow disk controller, then it may be reading its input at only 10% the speed of all the other nodes. So when 99 map tasks are already complete, the system is still waiting for the final map task to check in, which takes much longer than all the other nodes. By forcing tasks to run in isolation from one another, individual tasks do not know where their inputs come from. Tasks trust the Hadoop platform to just deliver the appropriate input. Therefore, the same input can be processed multiple times in parallel, to exploit differences in machine capabilities. As most of the tasks in a job are coming to a close, the Hadoop platform will schedule redundant copies of the remaining tasks across several nodes which do not have other work to perform. This process is known as speculative execution. When tasks complete, they announce this fact to the JobTracker. Whichever copy of a task finishes first becomes the definitive copy. If other copies were executing speculatively, Hadoop tells the TaskTrackers to abandon the tasks and discard their outputs. The Reducers then receive their inputs from whichever Mapper completed successfully, first. Speculative execution is enabled by default. You can disable speculative execution for the mappers and reducers by setting the mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution JobConf options to false, respectively

    If I got Giraph right they don't use speculative execution because they use their own iterative calculation paradigm where it didn't fit in. This paradigm is inspired by google's pregel which provides a more graph node centric view on the data. Furthermore fault-tolerance is created by checkpointing which means that each iteration, also called superstep, calculates all incoming messages per graph node and afterwards the messages are distributed between the nodes.

    Simply spoken MapReduce is not used in it's original way and therefore speculative execution for giraph makes no sense.