Search code examples
amazon-sagemakerendpointdata-qualitydata-captureamazon-sagemaker-clarify

Model Monitor Capture data - EndpointOutput Encoding is BASE64


https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-data-capture-endpoint.html

I have followed the steps mentioned in this link and it appears I cannot change the encoding for EndpointOutput in datacapture file. It's coming BASE64 for xgboost model. I am using latest version 1.2.3.

For monitor scheduler it required both EndpointOutput and EndpointInput to have the same encoding. My EndpointInput is CSV but EndpointOutput is coming to be BASE64 and nothing can change it.

This is causing issue while run of analyzer. After baseline is generated and data is captured, when monitoring schedule runs the analyzer it throws error of Encoding mismatch. For it to run EndpointOutput and EndpointInput should have same encoding.

I saw we cannot do anything to change the encoding of output. I used LightGBM, CatBoost algorithms also and found for these EndpointOuput encoding is JSON, which is readable but still not solving the purpose.

Is there a way we can change EndpointOutput Encoding for DataCapture.

I have used option of deserializer in predictor, used both JSONDeserializer and CSVDeserializer, still I kept getting BASE64 with xgboost and JSON encoding format with LightGBM and CatBoost algorithm.


Solution

  • You can resolve the issue with your xgboost model when enabling data capture for your real-time SageMaker endpoint by adding the CaptureContentTypeHeader element to your DataCaptureConfig. This will ensure that the output data from the endpoint is stored in the desired format. Here's the updated configuration:

    DataCaptureConfig = {
        'EnableCapture': True,
        'InitialSamplingPercentage': initial_sampling_percentage,
        'DestinationS3Uri': s3_capture_upload_path,
        'CaptureOptions': [{"CaptureMode": capture_mode} for capture_mode in capture_modes],
        'CaptureContentTypeHeader': {"JsonContentTypes": ["application/json"]}
    }
    

    By adding the CaptureContentTypeHeader element with the specified content type, you can ensure that the captured data is stored in JSON format, as mentioned in the AWS SageMaker documentation Amazon Sagemaker API Reference - CaptureContentType.

    This should resolve the issue you were facing with the base64 format and allow you to capture the data in the desired JSON format.