Search code examples
javascriptphpmariadbgalera

Troubleshooting Mariadb Galera Cluster latency with apache fetch statements


Currently trying to figure out an issue with a mariadb galera cluster. We recently added the cluster to our real-time system. Since adding the cluster, the longer the website is open, the more latency and time it takes to return fetch calls from the database. For example the PHP fetch is a really simple MYSQL statement:

<?php       
//echo 'before'.utime(true); 
$dbquery= "SELECT * FROM `MyTable` WHERE Column = '$value' ORDER BY Id DESC LIMIT 1";
$result = $db->prepare($dbquery);
$result->execute();
$myResults = $result->fetchAll();
//echo 'after:'.utime(true);

echo json_encode($myResults);
?>

//javascript
this.ahaTimeout = setTimeout(() => {
   this.ahaInterval = setInterval(tmpFunction => {
    fetch("myDBcall.php", {
          method: 'post',
          headers: {
            "Content-type": "application/x-www-form-urlencoded; charset=UTF-8"
          }
        }).then(response => response.json()).then(data => {
    //do stuff

   }, 1000)
}, 1000 - new Date().getMilliseconds());

This call returns about 50 columns worth of data, and the table is cleared on website load, so it never gets extremely large.

The call starts out at about 20ms, but within 20 minutes, the call is around 200ms. The users of the website can keep the website open for hours at a time, and since its fetching every second, this results in a memory leak as soon as the fetch takes longer than a second to return.

We did not have any issues prior to the galera cluster, when we were using a single database. I made sure all the programs are connecting to the same node to troubleshoot and still see an issue.

The cluster and the workstation with the website open are on the same network, and not going through any BIG IP F5's or any other traffic manager.

The cluster is also connected on a 10GB network switch, so its relatively fast to sync between nodes. I also do not see any increase in latency on my development PC which has a single MariaDB and apache webserver installed. There is also only one website tab open at a time, so the database isnt being spammed with connects / reconnects during the test. (working on a separate fix for this by implementing memcache)

Below are our configuration files for the cluster. my.cnf

[client-server]
 !includedir /etc/my.cnf.d
[mysqld]
query_cache_type=0
query_cache_size=0
max_allowed_packet=768M

server.cnf

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
bind-address=<nodeIPaddress>
user=mysql

innodb_autoinc_lock_mode=2
innodb_flush_log_at_trx_commit=0
innodb_buffer_pool_size=128M

log-error=/var/log/mysqld/log
[galera]
wsrep_on=ON
wsrep_provider=/usr/lib64/galera-4/libgalera_smm.so

wsrep_node_name='mynode'
wsrep_node_incoming_address='<nodeIPaddress>'
wsrep_node_address='<10GBnetworkIPaddress>'

wsrep_cluster_name='myCluster'
wsrep_cluster_address="gcommm://<10GBnode1IP>:4567,<10GBnode2IP>:4567,<10GBnode3IP>:4567"

wsrep_provider_options='gcache.size=300M;gcache.page_size=300M;gcs.max_packet_size=1024000'
wsrep_slave_threads=4
wsrep_sst_method=rsync

Any help would be appreciated.


Solution

  • Narrowed the issue down to the BIG IP F5 virtual server that was doing the load balancing for the galera MariaDB cluster. For some reason, reading from the database through the virtual server caused severe lag, (but only reads). Fixed the issue by coding our website and java programs to fail over to the database nodes without a virtual server.