We are looking for a way to provide failover for ACS instances, so if one data-center goes offline, authentication via ACS automatically fails over into another data center.
Background:
We use ACS to transform SAML tokens that are provided by a custom-developed STS via the WS-Trust protocol. ACS is used to broker trust between our STS and a number of relying parties that are developed by 3rd parties. The relying parties are currently configured to connect to a specific ACS instance using its DNS URL.
We have looked into the following:
I don't think there is a realistic and foolproof solution here. As noted, you can create additional namespaces in other datacenters and take backups of your RP configs and transformation rules. To recover, your clients would need to reconfigure their apps to use the new namespace after you restore a backup to the new namespace. This can work in some scenarios (like Google and Yahoo! integration). It can even work (I think) for Active Directory integration. It is very problematic if you don't control the RP however.
A different, but blocking problem with this approach as well (for us at least) is that it won't work in the case of Windows Live name identifier claims. We get a different one per namespace for our users. So, even if we restored all our settings in another datacenter (and we control the RPs too!), our Windows Live users would be unable to login correctly because their name identifiers would no longer match with the new namespace. Google and Yahoo! would not have this problem as they can use a stable claim (like email).
Basically, it appears you are mostly at the mercy of the datacenter operations team to failover to the subregion quickly in case of total datacenter loss.