Search code examples
appfabricappfabric-cache

Creating a High Availability AppFabric Cache Cluster


Is there anything aside from setting Secondaries=1 in the cluster configuration to enable HighAvailability, specifically on the cache client configuration?

Our configuration:

With the about configuration, we see primary and secondary regions created on the three hosts, however when one of the hosts is stopped, the following exceptions occur:

  • ErrorCode<ERRCA0018>:SubStatus<ES0001>:The request timed out.
  • An existing connection was forcibly closed by the remote host
  • No connection could be made because the target machine actively refused it 192.22.0.34:22233
  • An existing connection was forcibly closed by the remote host

Isn't the point of High Availability to be able to handle hosts going down without interrupting service? We are using a named region - does this break the High Availability? I read somewhere that named regions can only exist on one host (I did verify that a secondary does exist on another host). I feel like we're missing something for the cache client configuration would enable High Availability, any insight on the matter would be greatly appreciated.


Solution

  • After opening a ticket with Microsoft we narrowed it down to having a static DataCacheFactory object.

    public class AppFabricCacheProvider : ICacheProvider
    {
        private static readonly object Locker = new object();
        private static AppFabricCacheProvider _instance;
        private static DataCache _cache;
    
        private AppFabricCacheProvider()
        {
        }
    
        public static AppFabricCacheProvider GetInstance()
        {
            lock (Locker)
            {
                if (_instance == null)
                {
                    _instance = new AppFabricCacheProvider();
                    var factory = new DataCacheFactory();
                    _cache = factory.GetCache("AdMatter");
                }
            }
            return _instance;
        }
        ...
    }
    

    Looking at the tracelogs from AppFabric, the clients are still trying to connect to all the hosts without handling hosts going down. Resetting IIS on the clients forces a new DataCacheFactory to be created (in our App_Start) and stops the exceptions.

    The MS engineers agreed that this approach was the best practices way (we also found several articles about this: see link and link)

    They are continuing to investigate a solution for us. In the mean time we have come up with the following temporary workaround where we force a new DataCacheFactory object to be created in the event that we encounter one of the above exceptions.

    public class AppFabricCacheProvider : ICacheProvider
    {
        private const int RefreshWindowMinutes = -5;
    
        private static readonly object Locker = new object();
        private static AppFabricCacheProvider _instance;
        private static DataCache _cache;
        private static DateTime _lastRefreshDate;
    
        private AppFabricCacheProvider()
        {
        }
    
        public static AppFabricCacheProvider GetInstance()
        {
            lock (Locker)
            {
                if (_instance == null)
                {
                    _instance = new AppFabricCacheProvider();
                    var factory = new DataCacheFactory();
                    _cache = factory.GetCache("AdMatter");
                    _lastRefreshDate = DateTime.UtcNow;
                }
            }
            return _instance;
        }
    
        private static void ForceRefresh()
        {
            lock (Locker)
            {
                if (_instance != null && DateTime.UtcNow.AddMinutes(RefreshWindowMinutes) > _lastRefreshDate)
                {
                    var factory = new DataCacheFactory();
                    _cache = factory.GetCache("AdMatter");
                    _lastRefreshDate = DateTime.UtcNow;
                }
            }
        }
    
        ...
    
        public T Put<T>(string key, T value)
        {
            try
            {
                _cache.Put(key, value);
            }
            catch (SocketException)
            {
                ForceRefresh();
                _cache.Put(key, value);
            }
            return value;
        }
    

    Will update this thread when we learn more.