I'm setting up an Azure Traffic Manager, in Priority Mode, for my website. I have a primary and a failover location, both being monitoring by a "FailoverMonitor.aspx" page - if any resources are down for the appropriate resource\region, I return a 500 error. I wanted to also make sure an error message was returned to the user if all locations were down.
In my testing, I decided to break both my primary (priority 1) and failover (priority 2), and in doing so, I saw that the primary location was served up.
This kind of surprised me, I half expected the site to not return anything at all.. but instead it served up a site that is considered to be in a "degraded" status.
I added a 3rd endpoint to the traffic manager that returns a "sorry we're down" page - but is this the intended methodology to return such a message? I just want to make sure I'm going through all intended steps and not misusing the service. Thanks!
When all the endpoints being monitored by Traffic Manager for a given profile are down, it makes a "best case effort" and responds as if all the endpoints are actually in an online state, instead of not returning any endpoint at all.
More details of this and other endpoint monitoring details can be found at: https://azure.microsoft.com/en-us/documentation/articles/traffic-manager-monitoring/
Relevant section copy pasted below:
What happens if all Traffic Manager endpoints (excluding endpoints with a Disabled or Stopped status) are failing their health checks, and show a Degraded status? This most commonly is caused by an error in the configuration of the service (such as an access control list [ACL] blocking the Traffic Manager health checks), or an error in the configuration of the Traffic Manager profile (such as an incorrect monitoring path). In this case, Traffic Manager makes a "best effort" attempt and responds as if all the Degraded status endpoints actually are in an online state. This is preferable to the alternative, which would be to not return any endpoint in the DNS response. A consequence of this behavior is that if Traffic Manager health checks are not configured correctly, it might appear from the traffic routing as though Traffic Manager is working properly. However, in this case, endpoint failover will not happen if an endpoint fails, and this affects overall application availability. To ensure that this does not occur, it is important to check that the profile shows an Online status, and not a Degraded status. An Online status shows that the Traffic Manager health checks are working as expected.