Search code examples
amazon-web-servicesterraformstate-machineamazon-sagemakeraws-step-functions

How to parse stepfunction executionId to SageMaker batch transform job name?


I have created a stepfunction, the definition for this statemachine below (step-function.json) is used in terraform (using the syntax in this page:https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTransformJob.html)

The first time if I execute this statemachine, it will create a SageMaker batch transform job named example-jobname, but I need to exeucute this statemachine everyday, then it will give me error "error": "SageMaker.ResourceInUseException", "cause": "Job name must be unique within an AWS account and region, and a job with this name already exists .

The cause is because the job name is hard-coded as example-jobname so if the state machine gets executed after the first time, since the job name needs to be unique, the task will fail, just wondering how I can add a string (something like ExecutionId at the end of the job name). Here's what I have tried:

  1. I added "executionId.$": "States.Format('somestring {}', $$.Execution.Id)" in the Parameters section in the json file, but when I execute the task I got error "error": "States.Runtime", "cause": "An error occurred while executing the state 'SageMaker CreateTransformJob' (entered at the event id #2). The Parameters '{\"BatchStrategy\":\"SingleRecord\",..............\"executionId\":\"somestring arn:aws:states:us-east-1:xxxxx:execution:xxxxx-state-machine:xxxxxxxx72950\"}' could not be used to start the Task: [The field \"executionId\" is not supported by Step Functions]"}

  2. I modified the jobname in the json file to "TransformJobName": "example-jobname-States.Format('somestring {}', $$.Execution.Id)",, when I execute the statemachine, it gave me error: "error": "SageMaker.AmazonSageMakerException", "cause": "2 validation errors detected: Value 'example-jobname-States.Format('somestring {}', $$.Execution.Id)' at 'transformJobName' failed to satisfy constraint: Member must satisfy regular expression pattern: ^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}; Value 'example-jobname-States.Format('somestring {}', $$.Execution.Id)' at 'transformJobName' failed to satisfy constraint: Member must have length less than or equal to 63

I really run out of ideas, can someone help please? Many thanks.


Solution

  • So as per the documentation, we should be passing the parameters in the following format

            "Parameters": {
                "ModelName.$": "$$.Execution.Name",  
                ....
            },
    

    If you take a close look this is something missing from your definition, So your step function definition should be something like below:

    either

          "TransformJobName.$": "$$.Execution.Id",
    

    OR

          "TransformJobName.$: "States.Format('mytransformjob{}', $$.Execution.Id)"
    

    full State machine definition:

        {
            "Comment": "Defines the statemachine.",
            "StartAt": "Generate Random String",
            "States": {
                "Generate Random String": {
                    "Type": "Task",
                    "Resource": "arn:aws:lambda:eu-central-1:1234567890:function:randomstring",
                    "ResultPath": "$.executionid",
                    "Parameters": {
                    "executionId.$": "$$.Execution.Id"
                    },
                    "Next": "SageMaker CreateTransformJob"
                },
            "SageMaker CreateTransformJob": {
                "Type": "Task",
                "Resource": "arn:aws:states:::sagemaker:createTransformJob.sync",
                "Parameters": {
                "BatchStrategy": "SingleRecord",
                "DataProcessing": {
                    "InputFilter": "$",
                    "JoinSource": "Input",
                    "OutputFilter": "xxx"
                },
                "Environment": {
                    "SAGEMAKER_MODEL_SERVER_TIMEOUT": "300"
                },
                "MaxConcurrentTransforms": 100,
                "MaxPayloadInMB": 1,
                "ModelName": "${model_name}",
                "TransformInput": {
                    "DataSource": {
                        "S3DataSource": {
                            "S3DataType": "S3Prefix",
                            "S3Uri": "${s3_input_path}"
                        }
                    },
                    "ContentType": "application/jsonlines",
                    "CompressionType": "Gzip",
                    "SplitType": "Line"
                },
                "TransformJobName.$": "$.executionid",
                "TransformOutput": {
                    "S3OutputPath": "${s3_output_path}",
                    "Accept": "application/jsonlines",
                    "AssembleWith": "Line"
                },    
                "TransformResources": {
                    "InstanceType": "xxx",
                    "InstanceCount": 1
                }
            },
                "End": true
            }
            }
        }
    

    In the above definition the lambda could be a function which parses the execution id arn which I am passing via the parameters section:

     def lambda_handler(event, context):
        return(event.get('executionId').split(':')[-1])
    

    Or if you dont wanna pass the execution id , it can simply return the random string like

     import string
     def lambda_handler(event, context):
        return(string.ascii_uppercase + string.digits)
    

    you can generate all kinds of random string or do generate anything in the lambda and pass that to the transform job name.