Search code examples
bashubuntu-18.04monit

Monit gives error after first run of exec command


i am running Monit to monitor cpu usage on an ubuntu server on DigitalOcean. Then exec some commands to start or stop additional servers behind a load balancer.

Here is my code;

check system host_name
    if cpu usage > 50% for 5 cycles then exec "/bin/bash /var/www/start.sh"
    if cpu usage < 30% for 5 cycles then exec "/bin/bash /var/www/stop.sh"

And after first run, Monit shuts down one server and goes to error mode. Here is a part of log;

[UTC Jun 6 10:08:13] info : 'host_name' Monit reloaded

[UTC Jun 6 10:08:13] warning : 'host_name' cpu usage of 0.5% matches resource limit [cpu usage < 30.0%]

[UTC Jun 6 10:10:13] warning : 'host_name' cpu usage of 1.6% matches resource limit [cpu usage < 30.0%]

[UTC Jun 6 10:12:13] warning : 'host_name' cpu usage of 0.3% matches resource limit [cpu usage < 30.0%]

[UTC Jun 6 10:14:13] warning : 'host_name' cpu usage of 0.3% matches resource limit [cpu usage < 30.0%]

[UTC Jun 6 10:16:13] error : 'host_name' cpu usage of 0.3% matches resource limit [cpu usage < 30.0%]

[UTC Jun 6 10:16:13] info : 'host_name' exec: '/bin/bash /var/www/stop.sh'

[UTC Jun 6 10:18:13] error : 'host_name' cpu usage of 0.5% matches resource limit [cpu usage < 30.0%]

[UTC Jun 6 10:20:13] error : 'host_name' cpu usage of 0.3% matches resource limit [cpu usage < 30.0%]

[UTC Jun 6 10:22:13] error : 'host_name' cpu usage of 0.3% matches resource limit [cpu usage < 30.0%]

[UTC Jun 6 10:24:13] error : 'host_name' cpu usage of 0.2% matches resource limit [cpu usage < 30.0%]

[UTC Jun 6 10:26:13] error : 'host_name' cpu usage of 0.3% matches resource limit [cpu usage < 30.0%]

And when it gives error, it does not work again.

Bash scripts are working without errors.

What am i doing wrong?


Solution

  • Let's break apart what Monit is being asked to do:

    if cpu usage < 30% for 5 cycles then exec "/bin/bash /var/www/stop.sh"

     warning : 'host_name' cpu usage of 0.5% matches resource limit [cpu usage < 30.0%]
     warning : 'host_name' cpu usage of 1.6% matches resource limit [cpu usage < 30.0%]
     warning : 'host_name' cpu usage of 0.3% matches resource limit [cpu usage < 30.0%]
     warning : 'host_name' cpu usage of 0.3% matches resource limit [cpu usage < 30.0%]
    

    Your cpu usage is below 30%, so you get 4 warnings (but no action)

    if cpu usage < 30% for 5 cycles then exec "/bin/bash /var/www/stop.sh"

    error : 'host_name' cpu usage of 0.3% matches resource limit [cpu usage < 30.0%]
    info : 'host_name' exec: '/bin/bash /var/www/stop.sh'
    

    We've reached the 5th cycle, it's now considered an error and /var/www/stop.sh is run

    error : 'host_name' cpu usage of 0.5% matches resource limit [cpu usage < 30.0%]
    error : 'host_name' cpu usage of 0.3% matches resource limit [cpu usage < 30.0%]
    error : 'host_name' cpu usage of 0.3% matches resource limit [cpu usage < 30.0%]
    error : 'host_name' cpu usage of 0.2% matches resource limit [cpu usage < 30.0%]
    error : 'host_name' cpu usage of 0.3% matches resource limit [cpu usage < 30.0%]
    

    Since it has no further instructions, it just repeats the error (since it still exists). If you want monit to run stop.sh again, you will need to tell it to (and how often) for example you could do

    if cpu usage < 30% for 5 cycles then exec "/bin/bash /var/www/stop.sh" repeat every 5 cycles