Search code examples
linuxprocessmonitoringzabbix

Zabbix agent unable to detect PID of the running process


I am getting some triggers that show process unavailable, but when I check on the host it runs fine. Here is how the expression for the Trigger is set:

{$hostname:proc.num[,,,/etc/alternatives/java].last()}=0

It seems to be working fine for some hosts, but some of them triggers process unavailable and sends the alert.

Affected host:

# ps ax | grep java
 1717 ?        Ssl  119:15 /etc/alternatives/java -Dcom.sun.akuma.Daemon=daemonized -Djava.awt.headless=true -Djsse.enableSNIExtension=false -DJENKINS_HOME=/var/lib/jenkins -jar /usr/lib/jenkins/jenkins.war --logfile=/var/log/jenkins/jenkins.log --webroot=/var/cache/jenkins/war --daemon --httpPort=-1 --httpsPort=8443 --ajp13Port=8009 --debug=5 --handlerCountMax=100 --handlerCountMaxIdle=20 --httpsCertificate=/var/lib/jenkins/.ssl/hostssl.crt --httpsPrivateKey=/var/lib/jenkins/.ssl/hostssl.key

Zabbix log:

  2000:20160901:081336.721 Starting Zabbix Agent [$hostname]. Zabbix 2.2.8 (revision 51174).
  2000:20160901:081336.721 using configuration file: /etc/zabbix/zabbix_agentd.conf
  2002:20160901:081336.724 agent #0 started [collector]
  2004:20160901:081336.724 agent #2 started [listener #2]
  2005:20160901:081336.725 agent #3 started [listener #3]
  2006:20160901:081336.725 agent #4 started [active checks #1]
  2003:20160901:081336.725 agent #1 started [listener #1]
cat: /proc//status: No such file or directory
cat: /proc//status: No such file or directory
cat: /proc//status: No such file or directory
cat: /proc//status: No such file or directory

Host sending zabbix data properly:

# ps ax | grep java
 2472 ?        Ssl  1279:26 /etc/alternatives/java -Dcom.sun.akuma.Daemon=daemonized -Djava.awt.headless=true -Djsse.enableSNIExtension=false -Dorg.apache.commons.jelly.tags.fmt.timeZone=Europe/Dublin -DJENKINS_HOME=/var/lib/jenkins -jar /usr/lib/jenkins/jenkins.war --logfile=/var/log/jenkins/jenkins.log --webroot=/var/cache/jenkins/war --daemon --httpPort=-1 --httpsPort=8443 --ajp13Port=8009 --debug=5 --handlerCountMax=100 --handlerCountMaxIdle=20 --httpsCertificate=/var/lib/jenkins/.security/hostssl.crt --httpsPrivateKey=/var/lib/jenkins/.security/hostssl.key --httpsPort=8443

Zabbix log does not contain line cat: /proc//status: No such file or directory

In my understanding problem is that PID of the process is not discovered so it triggers an alert action.

Is there any way to troubleshoot this further so see why the zabbix agent cannot detect the PID of the running process on affected machines?


Solution

  • The problem is resolved now.

    I used zabbix_get to get results from the zabbix agent. There I found that it cannot get any processes from the jenkins or any other non-zabbix user.

    Googling brought me to this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1032691

    Applying custom SELinux policy resolved the issue.