Search code examples
daemonwatchdogsystemd

systemd restart service on watchdog does terminate previous hanged instance


I'm trying to setup systemd service configuration to restart service on watchdog failure. If my application does not call sd_notify() in time, systemd spawns new instance. However, previus instance is not killed. After some time, I have many instances of my application running.

$ systemctl status my-daemon.service

  Loaded: loaded (/lib/systemd/system/my-daemon.service; disabled)
  Active: active (running) since Tue, 26 Aug 2014 10:27:46 +0000; 7s ago
Main PID: 1433 (attendance-syst)
  CGroup: name=systemd:/system/my-daemon.service
      ├ 1281 /usr/local/bin/my-daemon
      ├ 1384 /usr/local/bin/my-daemon
      ├ 1407 /usr/local/bin/my-daemon
      └ 1433 /usr/local/bin/my-daemon
      ...

This is part of my service file:

[Service]
ExecStart=/usr/local/bin/my-daemon
TimeoutStopSec=5
WatchdogSec=10
Restart=on-failure

How can i configure systemd to kill instances which fails on watchdog?

I have already read manual page but it didn't help me.

I thought Restart=on-failure shall restart hanged process by default...


Solution

  • It's a bug and it's already fixed in newer versions of systemd.

    • In systemd 208 (available for debian jessie) it works correctly.

    • In systemd 204 (available for debian wheezy via backports) it's still broken.

    I haven't found exact release where they fixed it.