Search code examples
cachingappfabric

AppFabric Redundancy


We just tested an AppFabric cluster of 2 servers where we removed the "lead" server. The second server timeouts on any request to it with the error:

Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<ERRCA0017>:SubStatus<ES0006>: There is a temporary failure. Please retry later. (One or more specified Cache servers are unavailable, which could be caused by busy network or servers. Ensure that security permission has been granted for this client account on the cluster and that the AppFabric Caching Service is allowed through the firewall on all cache hosts. Retry later.)

In practive this means that if one server in the cluster goes down then they all go down. (Note we are not using Windows cluster, only linking multiple AppFabric cache servers to each other.)

I need the cluster to continue operating even if a single server goes down. How do I do this?

(I realize this question is borderlining Serverfault, but imho developers should know this.)


Solution

  • You'll have to install the AppFabric cache on at least three lead servers for the cache to survive a single server crash. The docs state that the cluster will only go down if the "majority" of the lead servers go down, but in the fine print, they explain that 1 out of 2 constitutes a majority. I've verified that removing a server from a three lead-node cluster works as advertised.