Search code examples
pythonapache-sparkhadoopdistributed-computing

Active tasks is a negative number in Spark UI


When using and , I saw this:

enter image description here

where you see that the active tasks are a negative number (the difference of the the total tasks from the completed tasks).

What is the source of this error?


Node that I have many executors. However, it seems like there is a task that seems to have been idle (I don't see any progress), while another identical task completed normally.


Also this is related: that mail I can confirm that many tasks are being created, since I am using 1k or 2k executors.

The error I am getting is a bit different:

16/08/15 20:03:38 ERROR LiveListenerBus: Dropping SparkListenerEvent because no remaining room in event queue. This likely means one of the SparkListeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler.
16/08/15 20:07:18 WARN TaskSetManager: Lost task 20652.0 in stage 4.0 (TID 116652, myfoo.com): FetchFailed(BlockManagerId(61, mybar.com, 7337), shuffleId=0, mapId=328, reduceId=20652, message=
org.apache.spark.shuffle.FetchFailedException: java.util.concurrent.TimeoutException: Timeout waiting for task.

Solution

  • It is a Spark issue. It occurs when executors restart after failures. The JIRA issue for the same is already created. You can get more details about the same from https://issues.apache.org/jira/browse/SPARK-10141 link.