Search code examples
amazon-s3amazon-rdshigh-availabilityamazon-aurorafault-tolerance

AWS Multi Region Service Availability and Operations


Some of the AWS Services give the ability to replicate between regions. e.g. S3 (CRR), RDS (Read Replica) etc.

  1. In S3-CRR, what happens if the destination Region goes down? Does the replication catch up automatically, once the Region is backup?

EDITED 2. Can CRR be enabled both ways? e.g. active-active

Similarly for RDS-MySQL Read Replica (RR) hosted in a different region what happens when

  1. If the RR instance/Destination Region goes down, does it affect the MASTER in the other Region?
  2. Once the instance is either replaced or Once the region in back up, does the RR catches up on the Missed changes that the MASTER have during the gap/outage?
  3. How Aurora will be different from RDS-MySQL in the above areas?

Solution

  • In S3 cross-region replication, if the destination region goes down, or connectivity is disrupted, replication of objects is delayed until the issue is resolved, then recovers.

    Cross-region can be used as active/active, but there is no conflict resolution, so if you wrote different objects with the same key to both regions at about the same time, which version would be the "final current version" in each region is undefined. As long as you aren't doing that, there's no problem. What you can't do is configure more than 2 regions in a ring, because A > B > C > A would only replicate one hop. Objects created in A would replicate A > B, but not B > C, because when an object is created by the replication process, it is not replicated further. That is, objects replicated into a bucket are never also replicated out of the bucket. Objects created directly in B would replicate B > C but not C > A.

    If an RDS cross-region replica fails or becomes inaccessible, the master is unaffected. Under the hood, the replica is listening to a stream of change messages from the master, but not acknowledging actually having applied the changes to its local data set, so if a replica disappears, it's a non-event from the master's perspective. Because there are sequencing/positioning pointers/markers in the replication stream, the replica knows where it left off and asks for the stream from the correct starting pointer when it reconnects.

    The replica will catch up when service/connectivity is restored, but not instantaneously. The time required depends on the amount of changed data that needs to replicate, and the capacity of the replica. This is true for standard RDS as well as Aurora -- cross-region replication is asynchronous.