Search code examples
azureazure-caching

"Temporary failure" on Azure caching service


Starting last night around 2:00 am - some 8 hours after anybody touched anything having to do with the website - our Azure website began throwing this error:

Error: ErrorCode:SubStatus:There is a temporary failure. Please retry later. (One or more specified cache servers are unavailable, which could be caused by busy network or servers. For on-premises cache clusters, also verify the following conditions. Ensure that security permission has been granted for this client account, and check that the AppFabric Caching Service is allowed through the firewall on all cache hosts. Also the MaxBufferSize on the server must be greater than or equal to the serialized object size sent from the client.). Additional Information : The client was trying to communicate with the server: net.tcp://payboardprod.cache.windows.net:22233. ( at Microsoft.ApplicationServer.Caching.DataCache.ThrowException(ErrStatus errStatus, Guid trackingId, Exception responseException, Byte[][] payload, EndpointID destination)

Basically, it looks as if our Azure cache server took a dive. But there's no indication of this anywhere on our Azure management console, which indicates that the caching server in question is up and running just fine. Nor is there any indication of a problem on the Azure service availability dashboard (http://azure.microsoft.com/en-us/support/service-dashboard/). The only indication of any sort of a problem is that our Azure cache service started reporting zero requests around 1:00 am.

Azure cache graph

Our beta site, which uses a different caching server but is otherwise configured identically, stayed up through this whole episode.

We just have a BizSpark account, and hence no ability to open support tickets with MS.

We've restored service by disabling external caching, but that's obviously not optimal.

Any suggestions for troubleshooting this?


Solution

  • Wrap your calling code in appropriate protection (try / catch) and then cope with the failure at the app tier. The commodity platform offered in any cloud can (and does) have these sorts of issues from time-to-time. You need to bake in logging and log somewhere like Azure Diagnostics (http://msdn.microsoft.com/en-us/library/gg433048.aspx) for later troubleshooting.