Is there anything aside from setting Secondaries=1
in the cluster configuration to enable HighAvailability, specifically on the cache client configuration?
Our configuration:
With the about configuration, we see primary and secondary regions created on the three hosts, however when one of the hosts is stopped, the following exceptions occur:
ErrorCode<ERRCA0018>:SubStatus<ES0001>:The request timed out.
An existing connection was forcibly closed by the remote host
No connection could be made because the target machine actively refused it 192.22.0.34:22233
An existing connection was forcibly closed by the remote host
Isn't the point of High Availability to be able to handle hosts going down without interrupting service? We are using a named region - does this break the High Availability? I read somewhere that named regions can only exist on one host (I did verify that a secondary does exist on another host). I feel like we're missing something for the cache client configuration would enable High Availability, any insight on the matter would be greatly appreciated.
After opening a ticket with Microsoft we narrowed it down to having a static DataCacheFactory
object.
public class AppFabricCacheProvider : ICacheProvider
{
private static readonly object Locker = new object();
private static AppFabricCacheProvider _instance;
private static DataCache _cache;
private AppFabricCacheProvider()
{
}
public static AppFabricCacheProvider GetInstance()
{
lock (Locker)
{
if (_instance == null)
{
_instance = new AppFabricCacheProvider();
var factory = new DataCacheFactory();
_cache = factory.GetCache("AdMatter");
}
}
return _instance;
}
...
}
Looking at the tracelog
s from AppFabric, the clients are still trying to connect to all the hosts without handling hosts going down. Resetting IIS on the clients forces a new DataCacheFactory
to be created (in our App_Start
) and stops the exceptions.
The MS engineers agreed that this approach was the best practices way (we also found several articles about this: see link and link)
They are continuing to investigate a solution for us. In the mean time we have come up with the following temporary workaround where we force a new DataCacheFactory
object to be created in the event that we encounter one of the above exceptions.
public class AppFabricCacheProvider : ICacheProvider
{
private const int RefreshWindowMinutes = -5;
private static readonly object Locker = new object();
private static AppFabricCacheProvider _instance;
private static DataCache _cache;
private static DateTime _lastRefreshDate;
private AppFabricCacheProvider()
{
}
public static AppFabricCacheProvider GetInstance()
{
lock (Locker)
{
if (_instance == null)
{
_instance = new AppFabricCacheProvider();
var factory = new DataCacheFactory();
_cache = factory.GetCache("AdMatter");
_lastRefreshDate = DateTime.UtcNow;
}
}
return _instance;
}
private static void ForceRefresh()
{
lock (Locker)
{
if (_instance != null && DateTime.UtcNow.AddMinutes(RefreshWindowMinutes) > _lastRefreshDate)
{
var factory = new DataCacheFactory();
_cache = factory.GetCache("AdMatter");
_lastRefreshDate = DateTime.UtcNow;
}
}
}
...
public T Put<T>(string key, T value)
{
try
{
_cache.Put(key, value);
}
catch (SocketException)
{
ForceRefresh();
_cache.Put(key, value);
}
return value;
}
Will update this thread when we learn more.