Search code examples
c#azure-service-fabric

Are secondary replicas essential in Service Fabric


I am new to stateful services. I have a requirement to spread my data across my cluster using reliable collections.

This is fine.

The confusion I have is with secondary replicas. My system writes data to a database.

Looking at the documentation, secondary replicas are designed to hold state as well. However, they will never really be accurate as I don't want them to write to the database.

So are they actually needed in my situation? How can I use stateful services for just partitioning my data across the cluster without worrying about replicas? Am I misunderstanding something?


Solution

  • If you are using Reliable Collections as your primary storage, yes, the secondary replica is essential to keep your data replicated across different nodes and keep availability of your data in case of a node failure.

    Because you are also adding the same data to a database, you don't have the same risk of loosing your data, but it would beneficial to keep the replicas in the cluster, because in case one node goes down with your data(that is very common during host updates and hardware failures), you would have to sync the database with the new services instances, and the sync might take too long when this data is too big, so your services will have to wait the synchronization finish before you start using it again, if you have the replicas already replicated on other nodes, the only process required in case of failure is elect a secondary to be the primary and continue the processing with smallest downtime possible.

    In my opinion, what you are doing is just adding extra overhead to the process, because the performance you gain keeping the data in the nodes are lost when you have to write it back to the database, unless this is static data just for querying. Also, the complexity of keeping them in sync, you might face issues for example while saving it to the database and already wrote it to the Reliable Collection and Vice Versa, having to handle rollback on different storages or keep them out of sync.

    Maybe you could consider replace the stateful service for a stateless and adding a cache layer between your service and the database, where every call to get an item in the database you would check if it isnot yet in the cache, if not, get from the database and add to the cache, for the case you could use:

    • InProcess cache via MemoryCache if you are using .Net;
    • Azure Redis cache in the same region\zone of your cluster
    • Redis cache in the same cluster as GuestExecutable(separate node)
    • Redis cache as a sidecar, deployed alongside your services on same node

    You can also have the partitioning concept of service fabric using stateless services, where each service would be responsible for a set of data, the first section of this documentation explains that.

    Regarding your arguments, about using Redis: Redis is one of the most reliable cache solutions out there, I don't think concurrency will be an issue for you, it can also be deployed as part of your cluster as a GuestExecutble or as a Container(preferred).

    Unless the DB+Cache is being a bottleneck for you current situation, 100000 items is almost nothing, and any DB system can handle that very well, I would recommend you stick with the DB solution only, because it is more mature and the internet is loaded with content covering the majority of the use cases. Adopting Reliable Collections will add complexity and maintenance to your solution that will not give much benefits on this scale.