Search code examples
apache-sparkakkatimeoutexception

Spark on cluster: I would like to know the meaning of the following error and possible causes:



I've the follow errors/warns:

1) WARN AkkaRpcEndpointRef: Error sending message [message = Heartbeat(2,[Lscala.Tuple2;@58149ee3,BlockManagerId(2, 192.168.0.171, 49714))] in 1 attempts java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]

2) ERROR CoarseGrainedExecutorBackend: Driver 192.168.0.131:41837 disassociated! Shutting down.

I'm running a Spark (v. 1.4.0) app in a cluster of 4 machines in which the driver has less memory (4 GB) of the workers (8 Gb each one). Is it possible that the driver produces the error due to its workload?


Solution

  • The driver was not able to respond to the executors since it was under stress during the computation. The problem was solved simply by adding mroe RAM to the driver.