My application connects to a fail-over cluster of two MySQL 5.6 (actually, Amazon Aurora) instances. The active node is always write-accessible, while the passive one is running in read_only
mode (this is different from the canonical MySQL fail-over cluster, where all slave nodes are write-accessible by default). Amazon RDS provides a symbolic DNS name which always points to the IP address of the active MySQL node.
During the fail-over process, the former master is restarted in read_only
mode, while the former passive node becomes write-accessible and is promoted to master. Also, the DNS records are changed, so that the DNS name of a cluster is now pointing to the new master node.
Even if I totally disable DNS caching on Java side (via sun.net.inetaddr.ttl
or networkaddress.cache.ttl
), OS-specific DNS caching is still effective, so after the database fail-over I end up having my DBCP pool full of connections to the read-only MySQL instance. These connections are valid
, i. e. they have been obtained after the fail-over completed but before the DNS cache expired. Additionally, none of such connections has readOnly
flag set, so I can't tell whether I'm talking to a read-only instance until I execute some DML, which is when ER_OPTION_PREVENTS_STATEMENT
comes in all its glory. Even if I explicitly put a connection to a read-write mode by calling setReadOnly(false)
and set the readOnlyPropagatesToServer
flag, this only results in a driver sending SET SESSION TRANSACTION READ WRITE
to the server, which doesn't lead to any exception thrown.
I would like to solve this problem with as little application logic affected as possible. This could be accomplished if there were a way to treat a connection to a read-only instance as an invalid/closed connection (i. e. evict it from the pool).
Can I have a validation query such as SHOW GLOBAL VARIABLES LIKE 'read_only'
with an additional logic tied to it? Is it possible to affect the pool's behavior w.r.t a connection based on which scalar value the validation query returns?
The following validation query can be used:
select case when @@read_only = 0 then 1 else (select table_name from information_schema.tables) end as `1`
If the database is running in read-only mode, the query will fail with
ERROR 1242 (21000): Subquery returns more than 1 row
Since Amazon Aurora sets innodb_read_only
but not read_only
on the reader endpoint in a cluster, the validation query can be rewritten as
select case when @@read_only + @@innodb_read_only = 0 then 1 else (select table_name from information_schema.tables) end as `1`
Inspired by this answer.