I am just getting started writing some dynamic endpoint discovery for my Service Fabric application and was looking for examples as to how to resolve service endpoints. I found the following code example on stackoverflow:
https://stackoverflow.com/a/38562986/4787510
I did some minor variations to this, so here is my code:
private readonly FabricClient m_fabricClient
public async Task RefreshEndpointList()
{
var appList = await m_fabricClient.QueryManager.GetApplicationListAsync();
var app = appList.Single(x => x.ApplicationName.ToString().Contains("<MyFabricDeploymentName>"));
// Go through all running services
foreach (var service in await m_fabricClient.QueryManager.GetServiceListAsync(app.ApplicationName))
{
var partitions = await m_fabricClient.QueryManager.GetPartitionListAsync(service.ServiceName);
// Go through all partitions
foreach (var partition in partitions)
{
// Check what kind of service we have - depending on that the resolver figures out the endpoints.
// E.g. Singleton is easy as it is just one endpoint, otherwise we need some load balancing later on
ServicePartitionKey key;
switch (partition.PartitionInformation.Kind)
{
case ServicePartitionKind.Singleton:
key = ServicePartitionKey.Singleton;
break;
case ServicePartitionKind.Int64Range:
var longKey = (Int64RangePartitionInformation)partition.PartitionInformation;
key = new ServicePartitionKey(longKey.LowKey);
break;
case ServicePartitionKind.Named:
var namedKey = (NamedPartitionInformation)partition.PartitionInformation;
key = new ServicePartitionKey(namedKey.Name);
break;
default:
throw new ArgumentOutOfRangeException($"Can't resolve partition kind for partition with id {partition.PartitionInformation.Id}");
}
var resolvedServicePartition = await ServicePartitionResolver.GetDefault().ResolveAsync(service.ServiceName, key, CancellationToken.None);
m_endpointCache.PutItem(service.ServiceTypeName, new ServiceDetail(service.ServiceTypeName, service.ServiceKind, ServicePartitionKind.Int64Range, resolvedServicePartition.Endpoints));
}
}
}
}
I'm quite happy I found this snippet, but while working through it, I found one thing where I am getting a little bit confused.
So, after reading through the SF docs, this seems to be the architecture it follows from top to bottom as far as I understood it:
Service Fabric Cluster -> Service Fabric application (E.g. myApp_Fabric) -> Services (E.g, frontend service, profile picture microservice, backend services)
From the services we can drill down to partitions, while a partition basically resembles a "container" on a node in my cluster on which multiple instances (replicas) can reside, instances being actual deployments of a service.
I'm not quite sure if I got the node / partition / replica difference right, though.
However, back to my confusion and actual question:
Why is the information regarding partition strategy (singleton, intRange, named) attached to the partitioninformation, rather than the service itself? As far as I understood it, a partition is basically the product of how I configured my service to be distributed across the service fabric nodes.
So, why is a partitionstrategy not directly tied to a service?
Regarding the services in Service Fabric, there are two types: stateful services and stateless services.
Stateless services do not deal with state using the reliable collections. If they need to maintain state they have to rely on external persistency solutions like databases etc. Since they do not deal with state provided by reliable collections they get assigned a Singelton Partition type.
Stateful services have the ability to store state in reliable collections. In order to be able to scale those services the data in those collections should be divided over partitions. Each service instance is assigned a specific partition. The amount of partitions is specified per service, like in the example below:
<Service Name="Processing">
<StatefulService ServiceTypeName="ProcessingType" TargetReplicaSetSize="3" MinReplicaSetSize="3">
<UniformInt64Partition PartitionCount="26" LowKey="0" HighKey="25" />
</StatefulService>
</Service>
So, given the example above, I do not understand your last remark about the partition strategy not being directly tied to a service.
Given the situation above, there will be 26 instances of that service running, one for each partition, multiplied by the number of replicas.
In case of a stateless services, there will be just one partition (the singleton partition) so the number of actual instances is 1 * 3 (the replica count) = 3. (3 replicas is just an example. Most times the instance count of a stateless service is set to -1, meaning 1 instance for every node in the cluster.)
One other thing: in your code you have a comment line in the piece of code iteration the partitions:
// E.g. Singleton is easy as it is just one endpoint, otherwise we need some load balancing later on
This comment is wrong stating that the partitioning has to do with load balancing. It is not, it has to do with how data is partitioned over the service instances and you need to get the address of the service that deals with a specific partition. Say I have a service with 26 partitions and I want to get data that is stored in, let's say, the 5th partition. I then need to get the endpoint for the instance that serves that partition.
You probably already read the docs. If not, I suggest reading it as well.
Addressing your comments:
I was just wondering, is it not possible that multiple services run on the same partition?
Reliable collections are coupled to the service using them, so are the underlying partitions. Hence, not more than one service can run on the same partition.
But, service instances can. If a service has a replica size of, let's say, 3, there will be 3 instances serving that partition. But only 1 is the primary instance, reading and writing the data that gets replicated to the secondary instances.