Search code examples
mysqlmariadbreplicationmonit

How to set Monit up to monitor MariaDB replication?


I use Monit 5.27.2 and MariaDB 10.5.15 on my Debian 11 servers.

My servers are set up in Multi-Source MariaDB Replication.

Sometimes, replication stops and I am not notified. I have not seen anywhere a prepared configuration for Monit to send such alerts.

The errors are visible with the following query:

SHOW SLAVE STATUS;

With those values particularly interesting:

slave_io_running and slave_sql_running values

last_error value

I have not found any way to use Monit to monitor directly the result of a SQL statement. However, it seems possible to monitor the return code of a command with versions before Monit 5.29.0 (and the output of a command after).

I can probably prepare a Bash/Shell script that will check the output of the SQL statement and then a Monit config that will send me an alert in case of issue. If I do, I will post this as an answer here. I was just wondering whether there was already a known solution to this.

I expect this to work with MySQL.

I intend to test this with those SQL queries:

STOP SLAVE;
START SLAVE;

Solution

  • I got this to work as I expected. I did this on all servers sources of the replication.

    Installing

    Here is the file monit_replication.sh:

    #!/bin/bash   
    user="YOUR_USER"
    password="YOUR_PASSWORD"
    
    result=`echo "show slave status"|mysql -EsB --user="$user" --password="$password"`
    
    contains_with_yes(){
            if echo "$1"|grep -i "$2"|grep -i "Yes" > /dev/null; then
                    echo "$2 ok"
            else
                    echo "$2 not ok"
                    exit 1
            fi
    }
     
    contains_with_yes "$result" "Slave_IO_Running"
    contains_with_yes "$result" "Slave_SQL_Running"
    exit 0
    

    Made this executable:

    chmod 700 /YOUR_PATH_TO_THE_FILE/monit_replication.sh
    

    Added to /etc/monit/conf-enabled/mysql:

     check program MySQL_replication with path "/YOUR_PATH_TO_THE_FILE/monit_replication.sh"
       every 120 cycles
       if status > 0 then alert
       group mysql 
    

    Testing

    Ran this SQL:

    STOP SLAVE;
    

    Restarted monit:

    /etc/init.d/monit restart
    

    Found in /var/log/monit.log:

    [2022-07-04T10:46:55+0000] error    : 'MySQL_replication' status failed (1) -- Slave_IO_Running not ok
    

    Received this email:

    monit alert -- Status failed MySQL_replication
    
    Status failed Service MySQL_replication
    
        Date:        Mon, 04 Jul 2022 10:46:55
        Action:      alert
        Host:        YOUR_HOST
        Description: status failed (1) -- Slave_IO_Running not ok
    
    Your faithful employee,
    Monit
    

    This validates the test. Restoring replication with this SQL query:

    START SLAVE;
    

    Other solutions

    If any simpler solution based on Monit can alert me of a replication failure on MariaDB, I would be happy to accept it as the solution to the question.