mysql database amazon-web-services amazon-aurora

How can I enable table level filtering on an MySQL Aurora read replica

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Replication.html#AuroraMySQL.Replication.Filters

I'm trying to get this enabled. I have an Aurora cluster. I've added a read replica. I've assigned it a different parameter group.

The parameter I need 'replicate-do-table' doesn't exist in the parameter group, nor do I see a way to add it via the console.

The article shows doing it via CLI, which gives me an error that the JSON is invalid, and as far as I can tell - it isn't.

What am I missing here?

Solution

I think the misunderstanding is a consequence of "replica" being overloaded with so many meanings.

TL;DR - some of the replication discussion in the doc primarily applies when replicating outward from Aurora to some non-Aurora MySQL server. The built-in Aurora replication is physical - comprehensive and efficient, but lacking some of the flexible options of the familiar MySQL replication. There are replication improvements (including filtering) available in Aurora MySQL version 3, so you could do that if your cluster is compatible with MySQL 8.0, but not with MySQL 5.7-compatible or 5.6-compatible ones.

Just on the fundamental JSON syntax level:

--parameters "[{"ParameterName": "replicate-wild-do-table", "ParameterValue": "mydb.orders%,mydb.returns%", "ApplyMethod":"immediate"}]"

maybe the outer quotes surrounding the whole JSON string should be single quotes.

If you still run into issues, it's worth diving deeper into the kinds of replication you can do with Aurora...

When you have an Aurora cluster and you add instance #2 to it, that instance is an "Aurora replica", also known as a "reader instance". It is associated with the same physical storage devices as instance #1 (the writer instance). That means the reader instance has all the same tables, indexes, partitions, and other schema objects as the writer instance. No filtering is possible. But filtering isn't needed, because there is no duplicate storage consumed on the reader instance by the schema objects and the data they contain. It's all being read off the same storage devices. If the writer instance crashes, the reader instance already has all the data available and so can take over very quickly.

The doc page says:

You can use replication filters to specify which databases and tables are replicated with a read replica

A "read replica" is a MySQL server outside of Aurora, that you replicate to using Aurora as the source. That logical replication (as opposed to the physical replication built into Aurora) has the flexibility to do filtering, delayed replication, nifty things like that. The drawbacks are things like CPU overhead to do the replication, the potential for higher replication lag, and the fact that the replicated data consumes extra storage.

The doc page says:

The following limitations apply to replication filtering for Aurora MySQL: Replication filters are supported only for Aurora MySQL version 3

So depending on which MySQL-compatible edition you're running, filtering might or might not be available. You'll need a MySQL 8.0-compatible cluster.

You will see on pages like https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Overview.StorageReliability.html that each Aurora cluster maintains 6 copies of all the cluster data behind the scenes, to guard against hardware failures and to distribute the overhead for reads. But Aurora manages all that invisibly behind the scenes. You don't get a 6x surprise on metrics like "how much storage you are using".

There is even a third kind of replica, an "Aurora read replica". That's when an entire Aurora cluster is a target of incoming replication from a source outside of Aurora. Then the whole Aurora cluster is treated as read-only.