Search code examples
apache-sparkubuntusystemdsystemctlinit.d

Failed to stop Apache Spark Master or Slave using Systemd


Perspectives


Actually I needs to configure two service files. One for Spark Master and another for Spark Slave (Worker) node. Please find the environment and service configuration as following:

Cofigurations


/opt/cli/spark-3.3.0-bin-hadoop3/etc/env


JAVA_HOME="/usr/lib/jvm/java-17-openjdk-amd64"
SPARK_HOME="/opt/cli/spark-3.3.0-bin-hadoop3"
PYSPARK_PYTHON="/usr/bin/python3"

/etc/systemd/system/spark-master.service


[Unit]
Description=Apache Spark Master
Wants=network-online.target
After=network-online.target

[Service]
User=spark
Group=spark
Type=forking

WorkingDirectory=/opt/cli/spark-3.3.0-bin-hadoop3/sbin
EnvironmentFile=/opt/cli/spark-3.3.0-bin-hadoop3/etc/env
ExecStartPost=/bin/bash -c "echo $MAINPID > /opt/cli/spark-3.3.0-bin-hadoop3/etc/spark-master.pid"
ExecStart=/opt/cli/spark-3.3.0-bin-hadoop3/sbin/start-master.sh
ExecStop=/opt/cli/spark-3.3.0-bin-hadoop3/sbin/stop-master.sh

[Install]
WantedBy=multi-user.target

/etc/systemd/system/spark-slave.service


[Unit]
Description=Apache Spark Slave
Wants=network-online.target
After=network-online.target

[Service]
User=spark
Group=spark
Type=forking

WorkingDirectory=/opt/cli/spark-3.3.0-bin-hadoop3/sbin
EnvironmentFile=/opt/cli/spark-3.3.0-bin-hadoop3/etc/env
ExecStartPost=/bin/bash -c "echo $MAINPID > /opt/cli/spark-3.3.0-bin-hadoop3/etc/spark-slave.pid"
ExecStart=/opt/cli/spark-3.3.0-bin-hadoop3/sbin/start-slave.sh spark://spark.cdn.chorke.org:7077
ExecStop=/opt/cli/spark-3.3.0-bin-hadoop3/sbin/stop-slave.sh

[Install]
WantedBy=multi-user.target

Outcome


It's started successfully but failed to stop successfully for some sorts of errors! Actually it's failed to stop Apache Spark Master or Slave using Systemd

Spark Master Stop Status


× spark-master.service - Apache Spark Master
     Loaded: loaded (/etc/systemd/system/spark-master.service; disabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Mon 2022-09-26 18:43:39 +08; 8s ago
       Docs: https://spark.apache.org/docs/3.3.0
    Process: 488887 ExecStart=/opt/cli/spark-3.3.0-bin-hadoop3/sbin/start-master.sh (code=exited, status=0/SUCCESS)
    Process: 489000 ExecStartPost=/bin/bash -c echo $MAINPID > /opt/cli/spark-3.3.0-bin-hadoop3/etc/spark-master.pid (code=exited, status=0/SUCCESS)
    Process: 489484 ExecStop=/opt/cli/spark-3.3.0-bin-hadoop3/sbin/stop-master.sh (code=exited, status=0/SUCCESS)
   Main PID: 488903 (code=exited, status=143)
        CPU: 4.813s

Spark Slave Stop Status


× spark-slave.service - Apache Spark Slave
     Loaded: loaded (/etc/systemd/system/spark-slave.service; disabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Mon 2022-09-26 18:38:22 +08; 15s ago
       Docs: https://spark.apache.org/docs/3.3.0
    Process: 489024 ExecStart=/opt/cli/spark-3.3.0-bin-hadoop3/sbin/start-slave.sh spark://ns12-pc04:7077 (code=exited, status=0/SUCCESS)
    Process: 489145 ExecStartPost=/bin/bash -c echo $MAINPID > /opt/cli/spark-3.3.0-bin-hadoop3/etc/spark-slave.pid (code=exited, status=0/SUCCESS)
    Process: 489174 ExecStop=/opt/cli/spark-3.3.0-bin-hadoop3/sbin/stop-slave.sh (code=exited, status=0/SUCCESS)
   Main PID: 489040 (code=exited, status=143)
        CPU: 4.306s

Expected Behavior


Your guide line would be appreciated to shutdown both Master & Slave node without any error.


Solution

  • Theoretical Solution


    In this case you have to write your own script for manipulating the shutdown to force exit code 0 instead of 143. If you are idle enough like me then you can changeSuccessExitStatus from 0 to 143. By default systemd unit test looking forSuccessExitStatus code is 0. We need to change the default unit test behavior.

    Practical Solution


    /etc/systemd/system/spark-master.service


    [Unit]
    Description=Apache Spark Master
    Wants=network-online.target
    After=network-online.target
    
    [Service]
    User=spark
    Group=spark
    Type=forking
    SuccessExitStatus=143
    
    WorkingDirectory=/opt/cli/spark-3.3.0-bin-hadoop3/sbin
    EnvironmentFile=/opt/cli/spark-3.3.0-bin-hadoop3/etc/env
    ExecStartPost=/bin/bash -c "echo $MAINPID > /opt/cli/spark-3.3.0-bin-hadoop3/etc/spark-master.pid"
    ExecStart=/opt/cli/spark-3.3.0-bin-hadoop3/sbin/start-master.sh
    ExecStop=/opt/cli/spark-3.3.0-bin-hadoop3/sbin/stop-master.sh
    
    [Install]
    WantedBy=multi-user.target
    

    /etc/systemd/system/spark-slave.service


    [Unit]
    Description=Apache Spark Slave
    Wants=network-online.target
    After=network-online.target
    
    [Service]
    User=spark
    Group=spark
    Type=forking
    SuccessExitStatus=143
    
    WorkingDirectory=/opt/cli/spark-3.3.0-bin-hadoop3/sbin
    EnvironmentFile=/opt/cli/spark-3.3.0-bin-hadoop3/etc/env
    ExecStartPost=/bin/bash -c "echo $MAINPID > /opt/cli/spark-3.3.0-bin-hadoop3/etc/spark-slave.pid"
    ExecStart=/opt/cli/spark-3.3.0-bin-hadoop3/sbin/start-slave.sh spark://spark.cdn.chorke.org:7077
    ExecStop=/opt/cli/spark-3.3.0-bin-hadoop3/sbin/stop-slave.sh
    
    [Install]
    WantedBy=multi-user.target