Search code examples
azureload-balancingazure-load-balancer

Is an Azure Global Load Balancer a single-point-of failure in a high availability scenario?


I'm evaluating options for building a high availability, safety-critical system in Azure. For this application it is highly important to be as close to 24/7 as it can possibly get. I came across the Global Load Balancer and looking at the first line in the documentation it states:

Azure Standard Load Balancer supports cross-region load balancing enabling geo-redundant high availability scenarios such as: [...]

The documentation shows the following architecture:

enter image description here

Can I rely on the fact that the cross-region load balancer in this picture is really high available in such a szenario and not a single point of failure? As I understand from the documentation, the cross-region load balancer has a home region and a participating region where the global public IP-address is advertised. When one Azure region does down, traffic flow is unaffected.

This should actually be enough for high availability as it is already multi-regional. What still leaves me sceptical is the provided SLA of 99.99% for the load balancer (which is about 4 mins per month). I understand that the provided SLA is a legal contract and nothing technical. The 10% compensation doesn't help much in this system when the SLA goes below 99.99%. I presume that the "real" technical SLA of the cross-region load balancer must actually be much higher, but because of legal reasons, Azure doesn't provide more than 99.99%.

On the other side I don't want to reinvent the wheel by introducing some kind of active/active redundancy over multiple regions manually, as I assume that Azure has put some more thorough thoughs on how to achive high availability.

What the question comes down to is: Can I trust that the cross-region load balancer achives at least as much high availability as I could achive on my own by doing some kind of active/active redundancy over multiple regions manually?


Solution

  • Yes, the Azure cross-region load balancer is highly available and not a single point of failure. It is optimized for ultra-low latency traffic distribution. With a single global anycast IP, you can add all your application’s regional load balancers to achieve high availability. If one region fails, traffic is automatically routed to the closest healthy regional load balancer, with no intervention needed from you. With automatic health probes and failovers, you can achieve high availability and regional redundancy for your applications.

    The home region, where the cross-region load balancer or Public IP Address of the Global tier is deployed, does not affect how traffic is routed. If the home region goes down, traffic flow remains unaffected.

    The Azure cross-region Load Balancer is backed by a 99.99% availability SLA, similar to the regional tier load balancer, Azure Front Door, and Azure Traffic Manager. The SLA provided by the selected global routing service represents the maximum attainable composite SLA, regardless of how many deployment regions are considered. Although some third-party global routing services provide a 100% SLA. However, the historic and attainable SLA provided by these services is typically lower than 100%.

    Networking and connectivity for mission-critical workloads on Azure recommends the use of multiple active regional deployment stamps with a global routing service to distribute traffic to each active stamp. And Azure Front Door, Azure Traffic Manager, and Azure Standard Load Balancer provide the needed routing capabilities to manage global traffic across a multi-region application.

    So, you can rely on the high availability provided by the above Azure global load balancers.

    For more clarity, you can refer the below documents:

    There is no research paper available, but you can take a look at the below technet blog regarding Microsoft SDN Software Load Balancers:

    If you still need more information on the architecture or SLA, you should open a case with MS and connect with the technical account manager or product group team.