Search code examples
macosplistzabbixlaunchdmacos-high-sierra

Trying to understand launchd daemon state


I am trying to setup a launchd daemon for Zabbix agent on macOS 10.13 High Sierra.

First I install the Zabbix agent with:

brew install zabbix --without-server-proxy

Then I create a property list named com.zabbix.zabbix_agentd.plist with this content:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
        <key>AbandonProcessGroup</key>
        <true/>
        <key>GroupName</key>
        <string>zabbix</string>
        <key>KeepAlive</key>
        <dict>
                <key>SuccessfulExit</key>
                <false/>
        </dict>
        <key>Label</key>
        <string>com.zabbix.zabbix_agentd</string>
        <key>ProgramArguments</key>
        <array>
                <string>/usr/local/sbin/zabbix_agentd</string>
                <string>-c</string>
                <string>/opt/zabbix/zabbix_agentd.conf</string>
        </array>
        <key>RunAtLoad</key>
        <true/>
        <key>StandardErrorPath</key>
        <string>/var/log/zabbix/zabbix_agentd.error.log</string>
        <key>StandardOutPath</key>
        <string>/var/log/zabbix/zabbix_agentd.stdout.log</string>
        <key>UserName</key>
        <string>zabbix</string>
</dict>
</plist>

I load it with:

sudo launchctl load ./com.zabbix.zabbix_agentd.plist

Now I can see that the daemon has done what I expected it to with:

ps ax | grep zabbix_agentd | grep -v grep

I see 6 zabbix processes. 1 collector, 3 listeners, 1 active check and the process that the launch daemon started:

8931   ??  S      0:00.01 /usr/local/sbin/zabbix_agentd -c /opt/zabbix_agentd.conf

But when I run this command:

launchctl print system/com.zabbix.zabbix_agentd | grep state

I get this output:

state = waiting

I expected to see state = running... Why does that command tell me that the daemon is waiting when it has 6 running processes?

Is this "works as designed" or did I do something wrong?


Solution

  • This is sort of "works as designed", but I'd really say it's a result of a philosophical conflict between zabbix and launchd about how daemons should work.

    When you run zabbix_agentd, it "daemonizes" itself, meaning that it fires off the actual daemon process as a background subprocess, and then the parent process exits; from that point on, the daemon process (and any subprocesses it starts) run pretty much independently from whatever started them. This is pretty much the traditional way unix daemons operate.

    launchd, on the other hand, is written to expect the daemons it manages to stay in the foreground and execute directly under it; this gives launchd much more ability to monitor and control its daemons than it would have if they distanced itself from launchd.

    This is a common conflict between traditional unix daemons and launchd, and there are two ways to solve it: either get the daemon to run in the foreground (i.e. conform to the launchd way of doing things), or tell launchd not to worry that the daemon seems to have quit. zabbix_agentd doesn't seem to have anything like a --nodaemon option (according to these docs), so you have to adapt launchd (update: newer versions do, see below). The standard way of doing this, which is pretty much what you have in your .plist, is to add AbandonProcessGroup and KeepAlive keys to tell launchd not to panic when (as far as it can tell) the daemon exits. This works, but it means that launchd cannot tell what's actually going on with the daemon, leading to the weird-looking results you see.

    UPDATE: I was looking at an old version of zabbix_agentd. Stefan spotted that a -f (or --foreground) option was added to zabbix_agentd in version 3.0. With this, I'd recommend adding --foreground to the ProgramArguments array, replacing the KeepAlive dictionary with a simple <true/> (this tells launchd to auto-restart the daemon if it exits for any reason), and removing <key>AbandonProcessGroup</key><true/> (this option controls whether launchd cleans up leftover subprocesses if the main daemon process exits/crashes). The result should look something like this:

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
    <plist version="1.0">
    <dict>
            <key>GroupName</key>
            <string>zabbix</string>
            <key>Label</key>
            <string>com.zabbix.zabbix_agentd</string>
            <key>ProgramArguments</key>
            <array>
                    <string>/usr/local/sbin/zabbix_agentd</string>
                    <string>-c</string>
                    <string>/opt/zabbix/zabbix_agentd.conf</string>
                    <string>--foreground</string>
            </array>
            <key>RunAtLoad</key>
            <true/>
            <key>KeepAlive</key>
            <true/>
            <key>StandardErrorPath</key>
            <string>/var/log/zabbix/zabbix_agentd.error.log</string>
            <key>StandardOutPath</key>
            <string>/var/log/zabbix/zabbix_agentd.stdout.log</string>
            <key>UserName</key>
            <string>zabbix</string>
    </dict>
    </plist>