I have pretty standard failover with Route53:
If Primary is unhealthy, then DNS returns a Secondary record, which is working perfectly, if user not using the application at the moment of failure.
So:
if Primary is unhealthy and user tries to use the app after failover has been activated - everything is ok (points to Secondary record)
if Primary becomes unhealthy when user is using the application, the application tries to access an old IP address, which is unavailable, so it is not switching to a secondary record.
Seems DNS is cached (can be checked here chrome://net-internals/#dns for Chrome). A user can continue to use the app after some time of an inactivity: when API was not triggered and Chrome's DNS cache is expired.
Is there any workaround for this particular case when Primary became unhealthy while a user is using the application? Or how can we make a user experience more pleasant in this case?
Added example:
Added:
We are using Alias records (TTL of A records on Route53 are always 60 sec)
It all comes down to TTL. If you set TTL on your resource to 30 seconds the browser should resolve the address every 30 seconds so that should be acceptable for most cases. Of course that comes at cost of a bit of latency and a bit more costs (though R53 is really cheap). If you need shorter TTL you can set it up.
If you wanted even more control over it you'd have to set up your own load balancer that would route to a different region when your machine goes down but that will not save you when EC2 fails (might buy you enough time to spin up a new instance though).