Search code examples
amazon-web-servicesmachine-learningbatch-processingamazon-sagemaker

Sagemaker batch transform job failure for 'batchStrategy: MultiRecord' along with data processing


We are using SageMaker Batch Transform job and to fit as many records in a mini-batch as can fit within the MaxPayloadInMB limit, we are setting BatchStrategy to MultiRecord and SplitType to Line.

Input to the SageMaker batch transform job is:

{"requestBody": {"data": {"Age": 90, "Experience": 26, "Income": 30, "Family": 3, "CCAvg": 1}}, "mName": "loanprediction", "mVersion": "1", "testFlag": "false", "environment": "DEV", "transactionId": "5-687sdf87-0bc7e3cb3454dbf261ed1353", "timestamp": "2022-01-15T01:45:32.955Z"}
{"requestBody": {"data": {"Age": 55, "Experience": 26, "Income": 450, "Family": 3, "CCAvg": 1}}, "mName": "loanprediction", "mVersion": "1", "testFlag": "false", "environment": "DEV", "transactionId": "5-69e22778-594916685f4ceca66c08bfbc", "timestamp": "2022-01-15T01:46:32.386Z"}

This is the SageMaker batch transform job config:

apiVersion: sagemaker.aws.amazon.com/v1
kind: BatchTransformJob
metadata:
        generateName: '...-batchtransform'
spec:
        batchStrategy: MultiRecord
        dataProcessing:
                JoinSource: Input
                OutputFilter: $
                inputFilter: $.requestBody
        modelClientConfig:
                invocationsMaxRetries: 0
                invocationsTimeoutInSeconds: 3
        mName: '..'
        region: us-west-2
        transformInput:
                contentType: application/json
                dataSource:
                        s3DataSource:
                                s3DataType: S3Prefix
                                s3Uri: s3://....../part-
                splitType: Line
        transformOutput:
                accept: application/json
                assembleWith: Line
                kmsKeyId: '....'
                s3OutputPath: s3://..../batch_output
        transformResources:
                instanceCount: ..
                instanceType: '..'

The SageMaker batch transform job fails with:

Error in batch transform data-log -

2022-01-27T00:55:39.781:[sagemaker logs]: ephemeral-dev-435945521637/loanprediction-usw2-dev/my-loanprediction/1/my-pipeline-9v28r/part-00001-99fb4b99-e8e7-4945-ac44-b6c5a95a2ffe-c000.txt:

2022-01-27T00:55:39.781:[sagemaker logs]: ephemeral-dev-435945521637/loanprediction-usw2-dev/my-loanprediction/1/my-pipeline-9v28r/part-00001-99fb4b99-e8e7-4945-ac44-b6c5a95a2ffe-c000.txt:

400 Bad Request 2022-01-27T00:55:39.781:[sagemaker logs]: ephemeral-dev-435945521637/loanprediction-usw2-dev/my-loanprediction/1/my-pipeline-9v28r/part-00001-99fb4b99-e8e7-4945-ac44-b6c5a95a2ffe-c000.txt:

Failed to decode JSON object: Extra data: line 2 column 1 (char 163)

Observation: This issue occurs when we provide batchStrategy: MultiRecord in the manifest along with these data processing configs:

dataProcessing:
        JoinSource: Input
        OutputFilter: $
        inputFilter: $.requestBody

NOTE: If we put batchStrategy: SingleRecord along with the aforementioned data processing configs, it just works fine (job succeeds)!

Question: How can we achieve successful run with batchStrategy: MultiRecord along with the aforementioned data processing config?

A successful output with batchStrategy: SingleRecord looks like this:

{"SageMakerOutput":{"prediction":0},"environment":"DEV","transactionId":"5-687sdf87-0bc7e3cb3454dbf261ed1353","mName":"loanprediction","mVersion":"1","requestBody":{"data":{"Age":90,"CCAvg":1,"Experience":26,"Family":3,"Income":30}},"testFlag":"false","timestamp":"2022-01-15T01:45:32.955Z"} {"SageMakerOutput":{"prediction":0},"environment":"DEV","transactionId":"5-69e22778-594916685f4ceca66c08bfbc","mName":"loanprediction","mVersion":"1","requestBody":{"data":{"Age":55,"CCAvg":1,"Experience":26,"Family":3,"Income":450}},"testFlag":"false","timestamp":"2022-01-15T01:46:32.386Z"} Region name – optional: Relevant resource ARN – optional: arn:aws:sagemaker:us-west-2:435945521637:transform-job/my-pipeline-9v28r-bat-e548fbfb125946528957e0f123456789


Solution

  • When your input data is in JSON line format and you choose a SingleRecord BatchStrategy, your container will receive a single JSON payload body like below

    { <some JSON data> }
    

    However, if you use MultiRecord, Batch transform will split your JSON line input (which might contain 100 lines for example) into multiple records (say 10 records) all sent at once to your container as shown below:

    { <some JSON data> }
    { <some JSON data> }
    { <some JSON data> }
    { <some JSON data> }
    .
    .
    .
    { <some JSON data> }
    

    Therefore your container should be able to handle such input for it to work. However, from the error message, I can see it is complaining about invalid JSON format as it reads the second row of the request.

    I also noticed that you have supplied ContentType and AcceptType as application/json but instead should be application/jsonlines

    Could you please test your container to see if it can handle multiple JSON line records per single invocation.