Search code examples
amazon-redshiftaws-step-functions

Calling a redshift stored procedure through step function does not wait to finish and triggers next job


When I am calling redshift query through AWS step function it does not wait for this step to finish and triggers next job.

I tried checking wait for callback option but then redshift step keeps on running and the job does not complete. I need to manually kill it.


Solution

  • The .waitForCallback integration pattern will cause Step Functions to make the API call described in the task, then wait for you to call back to Step Functions using the SendTaskSuccess or SendTaskFailure API Actions, including a Task Token that you sent along with the API call. In the case of arn:aws:states:::aws-sdk:redshiftdata:executeStatement, I don't think there's a feasible way to make that work, so it's not a fit for your use case.

    For certain of our Optimized Service Integrations, we support the Run a Job (.sync) integration pattern, but we do not have such an integration for Redshift data.

    For these scenarios, where you have an API that implements an asynchronous interaction, the best solution is a polling loop like the one shown below. This will run the query, try to load the result, then wait 5 seconds if it's not ready and then try again.

    You can build this loop into each of your workflows. Or if this is something you think you will want to re-use in many workflows, you can create a specific state machine to encapsulate the action, then call it from any of your state machines using the Optimized Service Integration for Step Functions, which does support .sync.

    enter image description here

    {
      "StartAt": "ExecuteStatement",
      "States": {
        "ExecuteStatement": {
          "Type": "Task",
          "Parameters": {
            "ClusterIdentifier": "example-cluster",
            "Database": "sample_data_dev",
            "Sql": "SELECT sum(qtysold) FROM tickit.sales, tickit.date WHERE sales.dateid = date.dateid AND caldate = '2008-01-05'"
          },
          "Resource": "arn:aws:states:::aws-sdk:redshiftdata:executeStatement",
          "Next": "GetStatementResult"
        },
        "GetStatementResult": {
          "Type": "Task",
          "Parameters": {
            "Id.$": "$.Id"
          },
          "Resource": "arn:aws:states:::aws-sdk:redshiftdata:getStatementResult",
          "Next": "Has the query finished?",
          "Catch": [
            {
              "ErrorEquals": [
                "RedshiftData.ResourceNotFoundException"
              ],
              "Next": "Wait to let the query complete",
              "ResultPath": "$.getStatementResultError"
            }
          ]
        },
        "Has the query finished?": {
          "Type": "Choice",
          "Choices": [
            {
              "And": [
                {
                  "Variable": "$.ColumnMetadata",
                  "IsPresent": true
                }
              ],
              "Next": "Success"
            }
          ],
          "Default": "Wait to let the query complete"
        },
        "Success": {
          "Type": "Succeed"
        },
        "Wait to let the query complete": {
          "Type": "Wait",
          "Seconds": 5,
          "Next": "GetStatementResult"
        }
      }
    }