Search code examples
azure-service-fabric

How to deal with long-lasting operations in Reliable Actors or stateful Reliable Service and 're-process' failed states


I'm new to Service Fabric Reliable Actors technology and trying to figure out best practices for this specific scenario:

Let's say we have some legacy code that we want to run new code built on SF Reliable Actors. Actors of certain type "ActorExecutor" are going to asynchronously call some third-party service that sometimes could stuck for pretty long time, longer than actor's calling client is ready to wait, or even experience some prolonged underling communication issues. We do not want client (legacy code) to get blocked by any sort of issues in ActorExecutor, it does not expect to receive any value or status back from actor. Should we use SF ReliableQueue for that? Should we use some sort of actor-broker to receive requests from client and storing them to queue: Client->ActorBroker->ActorExecutor? Are reminders could be helpful here?

One more question in this regard: Giving the situation is possible when many thousands of actors might stuck in 'third-party incomplete call' in the same time, and we want to reactivate and repeat the very last call for them, should we write a new tool for that? In NServiceBus you can create an error queue in MSMQ where all failed like 'unable to process' messages to be landed, and then we were able to simply re-process them anytime in the future. From my understanding, there is no such thing in Service Fabric and it's something we need to built on our own.


Solution

  • An event driven approach can help you here. Instead of waiting for the Actor to return from the call to a service, you can enqueue some task on it, to request it to perform some action. The service calling Actor would function autonomously, processing items from it's task queue. This will allow it to perform retries and error handling. After a successful call, a new event can notify the rest of the system.

    Maybe this project can help you to get started.

    edits:

    • At this time, I don't believe you can use reliable collections in Actors. So a queue inside the state of an Actor, is a regular (read-only) collection.
    • Process the queue using an Actor Timer. Don't use the threadpool, as it's not persistent and won't survive crashes and Actor garbage collections.