Search code examples
apache-sparkhadoopamazon-ec2hadoop-yarn

Spark app fails after ACCEPTED state for a long time. Log says Socket timeout exception


I have Hadoop 3.2.2 running on a cluster with 1 name node, 2 data nodes and 1 resource manager node. I tried to run the sparkpi example on cluster mode. The spark-submit is done from my local machine. YARN accepts the job but the application UI says this. Further in the terminal where I submitted the job it says 2021-06-05 13:10:03,881 INFO yarn.Client: Application report for application_1622897708349_0001 (state: ACCEPTED) This continues to print until it fails. Upon failure it prints

I tried increasing the spark.executor.heartbeatInterval to 3600 secs. Still no luck. I also tried running the code from namenode thinking there must be some connection issue with my local machine. Still I'm unable to run it


Solution

  • found the answer albeit I don't know why it works! Adding the private IP address to the security group in AWS did the trick.