Search code examples
amazon-web-servicesamazon-elasticacheredis-cluster

Is there a downtime when upgrading the AWS Elasticache cluster?


We are currently using Redis Engine 7.1 on our AWS elasticache cluster. We are looking to upgrade the cluster but we have no idea if there is any downtime involved in this? How long would the upgrade take on average and would there be any disruption to the writes during this period?

Tried to test the upgrade but results are inconsistent with regards to the upgrade time. Disruptions look minimal but I want to understand the full impact.


Solution

  • In general, for each shard: General time More detailed: general time

    Meaning - You won't be going to have read downtime at all if you have read from a replica.
    You will write downtime for single shard each time of few seconds. Generally, the downtime is between the master killed, until one of the replicas will be chosen as the new master, which generally happens pretty fast.

    Something that can add to the downtime is the client handling the topology change. If the client doesn't know to handle failovers or topology changes in general, it can lead to more drastic results.
    Some clients don't understand failovers if the IP doesn't change, and will address the failover as a disconnection, and will reconnect when the master is up again, thinking it is still the master.
    If there's a lot of data to sync, the master might be on status sync, and the client will queue the requests, and just when the sync is done the client will get an error and will refresh the topology picture.
    So to really test the case, you should try to mimic the upgrade with a similar data size cluster.
    Which client you use?

    Resource: https://docs.aws.amazon.com/AmazonElastiCache/latest/dg/VersionManagement.html