Search code examples
javamysqlconnection-poolingapache-commons-dbcpamazon-aurora

Evicting connections to a read-only node in a cluster from the connection pool


My application connects to a fail-over cluster of two MySQL 5.6 (actually, Amazon Aurora) instances. The active node is always write-accessible, while the passive one is running in read_only mode (this is different from the canonical MySQL fail-over cluster, where all slave nodes are write-accessible by default). Amazon RDS provides a symbolic DNS name which always points to the IP address of the active MySQL node.

During the fail-over process, the former master is restarted in read_only mode, while the former passive node becomes write-accessible and is promoted to master. Also, the DNS records are changed, so that the DNS name of a cluster is now pointing to the new master node.

Even if I totally disable DNS caching on Java side (via sun.net.inetaddr.ttl or networkaddress.cache.ttl), OS-specific DNS caching is still effective, so after the database fail-over I end up having my DBCP pool full of connections to the read-only MySQL instance. These connections are valid, i. e. they have been obtained after the fail-over completed but before the DNS cache expired. Additionally, none of such connections has readOnly flag set, so I can't tell whether I'm talking to a read-only instance until I execute some DML, which is when ER_OPTION_PREVENTS_STATEMENT comes in all its glory. Even if I explicitly put a connection to a read-write mode by calling setReadOnly(false) and set the readOnlyPropagatesToServer flag, this only results in a driver sending SET SESSION TRANSACTION READ WRITE to the server, which doesn't lead to any exception thrown.

I would like to solve this problem with as little application logic affected as possible. This could be accomplished if there were a way to treat a connection to a read-only instance as an invalid/closed connection (i. e. evict it from the pool).

Can I have a validation query such as SHOW GLOBAL VARIABLES LIKE 'read_only' with an additional logic tied to it? Is it possible to affect the pool's behavior w.r.t a connection based on which scalar value the validation query returns?


Solution

  • The following validation query can be used:

    select case when @@read_only = 0 then 1 else (select table_name from information_schema.tables) end as `1`
    

    If the database is running in read-only mode, the query will fail with

    ERROR 1242 (21000): Subquery returns more than 1 row
    

    Since Amazon Aurora sets innodb_read_only but not read_only on the reader endpoint in a cluster, the validation query can be rewritten as

    select case when @@read_only + @@innodb_read_only = 0 then 1 else (select table_name from information_schema.tables) end as `1`
    

    Inspired by this answer.