postgresql amazon-web-services database-design clone amazon-aurora

Amazon Aurora PostgreSQL: clone capability: down sides?

I have an Amazon Aurora PostgreSQL-compatible database, running as a "live" pilot instance.

I'm planning a formal production transition for early next year, which I had imagined would include the creation of development and test instances, snapshot restores to get started, etc. Furthermore, I have an immediate need to make some data model refinements, which have potential impact on existing views and procedures, and am reluctant to do this in the "live" instance, albeit there's no direct impact of downtime at the moment.

I've read the Amazon docs about Aurora cloning, but have failed to find any "real-world" articles or posts about using it in practice. I see one non-Amazon article, which really just re-states the Amazon summary.

Does anyone have any direct experience of this capability? Or inside knowledge of the mechanics? Specifically:

Can you make DDL (schema) changes to each instance independently? There's no mention of this in the documentation. I'm not sure if the use of the term "clone" implies that they remain structurally identical, but given the use cases cited I can't imagine you are basically freezing the DB structure by cloning it.
Is there any performance impact (given the distribution of storage between "frozen" shared pages and instance-specific pages?
If you create a clone of a database and then later remove that clone, have you irreversibly changed the storage pattern for the original database (including any performance implications of the process)?
Does it change the behaviour of deletes under the hood? I'm ignorant of the way Aurora storage works (and have only patchy knowledge of database storage in general), but in the old days, storage could be reclaimed for deleted data. In this model, if you clone a database, then delete a few rows from a table, what happens?

I'm going to test it by creating an "old-fashioned clone" (snapshot restore to a new instance), then cloning that, but any insights in the meantime gratefully received!

Solution

Yes, you can make schema changes to the clone and they do not affect the underlying database at all. They will cause the clone to need to copy every page in the table because the original pages all need to be changed for the clone.
That depends - we have seen that if you modify the schema of a large table the clone can be very slow - I haven't had an official explanation but I assume that it is because the clone is having to link through the original page pointers to get to its copies, which is fine for a small table or for a relatively small number of pages in a large table, but once the entire table is essentially copied due to the schema change we're seeing a case where a sub-second SELECT query takes 80 seconds on the clone. I will say that the actual schema change did not take longer than expected.
No, the original database's pages are never touched by the clone, they are used until the clone modifies them at which point they are copied for the clone's use only. If you later delete the entire original or the entire clone that's fine too the two databases work as if they were completely independent of each other, they only share unaltered pages.
No, basically the same answer as 3. If you delete rows in the clone the pages that contain those rows will be copied for the clone and left untouched for the original.

We're using clones for development and staging copies of production as you describe and it works great, but as I said there is a scenario (schema change to large table) where we are seeing some very poor performance. Generally performance has been fine we don't see any notable difference in the performance of regular INSERTs, UPDATEs or DELETEs - it would likely be more noticeable if you ran a huge UPDATE that touched most of the rows in a large table, but for regular application work it performs well.