Search code examples
apache-sparksigtermapache-spark-standalone

Why did Spark standalone Worker node-1 terminate after RECEIVED SIGNAL 15: SIGTERM?


Note: This error was thrown before the components were executed by spark.

Logs
Worker Node1:

17/05/18 23:12:52 INFO Worker: Successfully registered with master spark://spark-master-1.com:7077  
17/05/18 23:58:41 ERROR Worker: RECEIVED SIGNAL 15: SIGTERM

Master Node:

17/05/18 23:12:52 INFO Master: Registering worker spark-worker-1com:56056 with 2 cores, 14.5 GB RAM
17/05/18 23:14:20 INFO Master: Registering worker spark-worker-2.com:53986 with 2 cores, 14.5 GB RAM
17/05/18 23:59:42 WARN Master: Removing spark-worker-1com-56056 because we got no heartbeat in 60 seconds
17/05/18 23:59:42 INFO Master: Removing spark-worker-2.com:56056
17/05/19 00:00:03 ERROR Master: RECEIVED SIGNAL 15: SIGTERM

Worker Node2:

17/05/18 23:14:20 INFO Worker: Successfully registered with master spark://spark-master-node-2.com:7077
17/05/18 23:59:40 ERROR Worker: RECEIVED SIGNAL 15: SIGTERM

Solution

  • TL;DR I think someone has explicitly called kill command or sbin/stop-worker.sh.

    "RECEIVED SIGNAL 15: SIGTERM" is reported by a shutdown hook to log TERM, HUP, INT signals on UNIX-like systems:

      /** Register a signal handler to log signals on UNIX-like systems. */
      def registerLogger(log: Logger): Unit = synchronized {
        if (!loggerRegistered) {
          Seq("TERM", "HUP", "INT").foreach { sig =>
            SignalUtils.register(sig) {
              log.error("RECEIVED SIGNAL " + sig)
              false
            }
          }
          loggerRegistered = true
        }
      }
    

    In your case it means that the process received SIGTERM to stop itself:

    The SIGTERM signal is a generic signal used to cause program termination. Unlike SIGKILL, this signal can be blocked, handled, and ignored. It is the normal way to politely ask a program to terminate.

    That's what is sent when you execute KILL or use ./sbin/stop-master.sh or ./sbin/stop-worker.sh shell scripts that in turn call sbin/spark-daemon.sh with stop command that kills a JVM process for a master or a worker:

    kill "$TARGET_ID" && rm -f "$pid"