Search code examples
monitoringbosun

Configure scollector to log if a process is running


I'm trying to use Bosun to determine if certain processes are running and then eventually alert on if they are up or down. I'm probably misinterpreting the docs but I can't figure this out.

Bosun is running just fine. I have the scollector running on Ubuntu 14 LTS and using my config file correctly.

Here is what I have in my scollector.toml:

host="blah:8070"
hostname="cass01"

[[Process]]
  command =  "^.*(java).*(CassandraDaemon)$"
  name = "Cassandra"

I would then expect to see in bosun under my host cass01 a metric title "cassandra" somewhere but it's nowhere to be seen. Other metrics are there.


Solution

  • Right now Command is a partial match on the process path to the binary, up to the first space delimiter. The Args parameter is a regex to differentiate between multiple instances of the process. So for a java process you would use something like:

    [[Process]]
      Command = "java"
      Name = "Cassandra"
      Args = "CassandraDaemon$"
    

    This would match a command line like:

    /usr/bin/java /usr/bin/CassandraDaemon
    

    This assumes the /proc/<pid>/cmdline for that process ends in CassandraDaemon. If it doesn't end in that string you would need to change the Args to just "CassandraDaemon" which would match any java process that contains that string.

    Also some processes change the cmdline to something other than a nul delimited string. In those cases the Command argument needs to be used to match as Args is expecting nul delimiters. Example:

    cat /proc/80156/cmdline | hexdump -C
    00000000  2f 75 73 72 2f 62 69 6e  2f 72 65 64 69 73 2d 73  |/usr/bin/redis-s|
    00000010  65 72 76 65 72 20 2a 3a  36 33 37 39 00 00 00 00  |erver *:6379....|
    00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
    00000030  00                                                |.|
    00000031
    
    #Example for cmdline without NUL (00) delimiters between args
    [[Process]]
      Command = "redis-server *:6379"
      Name = "redis-core"
    

    Once these are in place with the correct matching values you should see metrics show up under linux.proc.* where the the name tag will match the name used in the TOML file.