Search code examples
design-patternsmicroservicescqrsevent-sourcingsaga

Does Saga Pattern able to help to reverse Payouts incase if any Failure occurs?


I am very new to Saga Pattern. I understood that with the help of Saga, we can able to reverse the things if any failure occurs.

Whatever the examples I've seen, they are mostly like Orders Service -> Payment Service -> Other Service, and in Payment service, funds happen from Customer to Merchant, and incase if any failure occurs at Other Service, this payment transaction can be able to reverse because here funds flowing from Merchant to Customer (in reverse failure process)

BUT, my Query is: I have a Reverse scenario like this: Payouts Service -> Customer Service

In Payouts Service, funds happen from Merchant to Customer

Can we able to make Reverse the Transactions for Payouts using Saga in case if any Failure occurs at Customer Service? (ie., Reverse the funds from Customer to Merchant, incase if any failure occurs)

Does above possible using Saga? Hope my query is clear. I will be glad if someone could be able to help me on above.


Solution

  • The saga pattern allows you to coordinate a number of operations an in case there is a failure in one of the steps, it allows you to coordinate the steps to reverse what has been done so far, so you can see it in a way of making all the steps transactional without an actual transaction.

    But sagas cannot change the nature of each of the steps that you do. Charging a customer is an operation that we can reverse because we can create a compensation action, like returning the funds to the customer. Paying out money to a customer is an operation that we cannot reverse because once the money is in the customer's hands we cannot take it back. Therefore a saga can help you make sure that all the steps have happened or make sure that nothing has happened, but it won't allow you to reverse things that cannot be reversed.

    In this case, what you can do is leave the operation that cannot be reversed by the end. For example, discount the money from the Customer wallet (which is an operation on a database you own) and then do the actual payout. If the payout fails, the saga will allow you to reverse the subtraction from the customer's wallet, so no "damage" will have been caused.


    Update: extra information added based on the comment

    Sagas are (normally) implemented with messaging. The saga is the orchestrator, it doesn't really do the actual steps. Instead, it sends out commands so other processes perform the steps and receives messages or events with the results of the operations.

    Also, note that the steps executed by a saga can have transient failures or unrecoverable/permanent failures. Transient failures can be retried and most likely they'll succeed. Permanent failures won't succeed with retries.

    Considering all this, I would solve the problem as follows:

    1. Send a command to update the DB and store that the transaction is happening. This locks the funds so, even if the payout hasn't happened yet, you cannot initiate another one and end up paying out too much money. Failures here can be transient (db not available) or permanent (not enough funds). You can retry transient failures until it works and notify for permanent failures. Send a message back to the saga with the result of the operation.
    2. If step 1 succeded, send a command to execute the payout. Transient failure: external service not available. Retry up to X times. Permanent: payout data invalid. If the payout cannot happen, the saga will compensate step 1 and free up the funds in the wallet. Send a message back to the saga with the result: either a failure or the response from the external HTTP service.
    3. If the payout failed, send a command to rollback step 1. If it succeeded, send a command to confirm the payout and store the response in the DB. Note that here you shouldn't have permanent failures while updating the DB. At most, you can have transient failures (db not available), but retrying should manage to finally update the DB. If even after many retries the operation cannot succeed. You'll need manual intervention and fix the issue, as you can't rollback step 2. You should model your process so that this step is reliable (validations and other things that can cause permanent failures should be caught in step 1 or 2).

    I hope that makes sense. You need to make sure to cover all scenarios and that you can send the messages reliably even if the operation failed.