Search code examples
cadence-workflow

How to make changes or fixes to Uber Cadence Workflow without breaking determinism?


What is the recommended practice for upgrading running workflows?

If there are already running executions created using previous workflow implementation, then making any code change or updating workflow logic results in "Non Deterministic Error" from Cadence as it is unable to replay history for existing workflow executions using updated implementation.

What are some strategies to deal with upgrades without breaking existing workflow executions?


Solution

  • GetVersion is used to safely perform backwards incompatible changes to workflow definitions. It is not allowed to update workflow code while there are workflows running as it is going to break determinism. The solution is to have both old code that is used to replay existing workflows as well as the new one that is used when it is executed for the first time. GetVersion returns maxSupported version when is executed for the first time. This version is recorded into the workflow history as a marker event. Even if maxSupported version is changed the version that was recorded is returned on replay. DefaultVersion constant contains version of code that wasn't versioned before. For example initially workflow has the following code:

    err = cadence.ExecuteActivity(ctx, foo).Get(ctx, nil)
    

    it should be updated to

    err = cadence.ExecuteActivity(ctx, bar).Get(ctx, nil)
    

    The backwards compatible way to execute the update is

    v :=  GetVersion(ctx, "fooChange", DefaultVersion, 1)
    if v  == DefaultVersion {
        err = cadence.ExecuteActivity(ctx, foo).Get(ctx, nil)
    } else {
        err = cadence.ExecuteActivity(ctx, bar).Get(ctx, nil)
    }
    

    Then bar has to be changed to baz:

    v :=  GetVersion(ctx, "fooChange", DefaultVersion, 2)
    if v  == DefaultVersion {
        err = cadence.ExecuteActivity(ctx, foo).Get(ctx, nil)
    } else if v == 1 {
        err = cadence.ExecuteActivity(ctx, bar).Get(ctx, nil)
    } else {
        err = cadence.ExecuteActivity(ctx, baz).Get(ctx, nil)
    }
    

    Later when there are no workflows running DefaultVersion the correspondent branch can be removed:

    v :=  GetVersion(ctx, "fooChange", 1, 2)
    if v == 1 {
        err = cadence.ExecuteActivity(ctx, bar).Get(ctx, nil)
    } else {
        err = cadence.ExecuteActivity(ctx, baz).Get(ctx, nil)
    }
    

    Currently there is no supported way to completely remove GetVersion call after it was introduced. Keep it even if single branch is left:

    GetVersion(ctx, "fooChange", 2, 2)
    err = cadence.ExecuteActivity(ctx, baz).Get(ctx, nil)
    

    It is necessary as GetVersion performs validation of a version against a workflow history and fails decisions if a workflow code is not compatible with it.

    The Java has similar Workflow.getVersion API.