I'm currently developing some ETL for my ML model with AWS. The thing is that I want to trigger a Lambda when some Sagemaker Processing Job is finished. And the event passed to the Lambda, should be the configuration info (job name, arguments, etc..) of the Sagemaker Processing Job.
Q1: How can I do to trigger the event when the Processing Job is finished?
Q2: How can I do to pass the Processing Job configurations as an event for the Lambda?
You can use the following EventBridge rule pattern:
{
"source": ["aws.sagemaker"],
"detail-type": ["SageMaker Processing Job State Change"],
"detail": {
"ProcessingJobStatus": ["Failed", "Completed", "Stopped"]
}
}
The ProcessingJobStatus list can be modified based on which statuses you want to handle.
You can set a Lambda function as the target of your EventBridge rule.
Here is a sample event which will be passed to your Lambda, taken from AWS console:
{
"version": "0",
"id": "0a15f67d-aa23-0123-0123-01a23w89r01t",
"detail-type": "SageMaker Processing Job State Change",
"source": "aws.sagemaker",
"account": "123456789012",
"time": "2019-05-31T21:49:54Z",
"region": "us-east-1",
"resources": ["arn:aws:sagemaker:us-west-2:012345678987:processing-job/integ-test-analytics-algo-54ee3282-5899-4aa3-afc2-7ce1d02"],
"detail": {
"ProcessingInputs": [{
"InputName": "InputName",
"S3Input": {
"S3Uri": "s3://input/s3/uri",
"LocalPath": "/opt/ml/processing/input/local/path",
"S3DataType": "MANIFEST_FILE",
"S3InputMode": "PIPE",
"S3DataDistributionType": "FULLYREPLICATED"
}
}],
"ProcessingOutputConfig": {
"Outputs": [{
"OutputName": "OutputName",
"S3Output": {
"S3Uri": "s3://output/s3/uri",
"LocalPath": "/opt/ml/processing/output/local/path",
"S3UploadMode": "CONTINUOUS"
}
}],
"KmsKeyId": "KmsKeyId"
},
"ProcessingJobName": "integ-test-analytics-algo-54ee3282-5899-4aa3-afc2-7ce1d02",
"ProcessingResources": {
"ClusterConfig": {
"InstanceCount": 3,
"InstanceType": "ml.c5.xlarge",
"VolumeSizeInGB": 5,
"VolumeKmsKeyId": "VolumeKmsKeyId"
}
},
"StoppingCondition": {
"MaxRuntimeInSeconds": 2000
},
"AppSpecification": {
"ImageUri": "012345678901.dkr.ecr.us-west-2.amazonaws.com/processing-uri:latest"
},
"NetworkConfig": {
"EnableInterContainerTrafficEncryption": true,
"EnableNetworkIsolation": false,
"VpcConfig": {
"SecurityGroupIds": ["SecurityGroupId1", "SecurityGroupId2", "SecurityGroupId3"],
"Subnets": ["Subnet1", "Subnet2"]
}
},
"RoleArn": "arn:aws:iam::012345678987:role/SageMakerPowerUser",
"ExperimentConfig": {},
"ProcessingJobArn": "arn:aws:sagemaker:us-west-2:012345678987:processing-job/integ-test-analytics-algo-54ee3282-5899-4aa3-afc2-7ce1d02",
"ProcessingJobStatus": "Completed",
"LastModifiedTime": 1589879735000,
"CreationTime": 1589879735000
}
}
Edit:
If you want to match a ProcessingJobName with specific prefix:
{
"source": ["aws.sagemaker"],
"detail-type": ["SageMaker Processing Job State Change"],
"detail": {
"ProcessingJobStatus": ["Failed", "Completed", "Stopped"],
"ProcessingJobName": [{
"prefix": "standarize-data"
}]
}
}