I am working on an experiment with integrating ML predictions inside my RDS Postgres 14.6.7 database. I followup this tutorial and managed to get the data loaded, sagemaker model created, sagemaker training completed and endpoint deployed. Everything was looking good.
Then I wrote some code to test the endpoint prediction.
import boto3
import json
# Create a Boto3 client for SageMaker
client = boto3.client('sagemaker-runtime')
# Set the name of the SageMaker endpoint to invoke
endpoint_name = 'Custom-sklearn-model-2024-03-26-20-15-12'
# Set the content type of the input data
content_type = 'application/json'
# Hardcoded input data
input_data = [
[1454.0, 1.0, 0.5, 1.0, 1.0, 0.0, 34.0, 0.7, 83.0, 4.0, 3.0, 250.0, 1033.0, 3419.0, 7.0, 5.0, 5.0, 1.0, 1.0, 0.0],
[1092.0, 1.0, 0.5, 1.0, 10.0, 0.0, 11.0, 0.5, 167.0, 3.0, 14.0, 468.0, 571.0, 737.0, 14.0, 4.0, 11.0, 0.0, 1.0, 0.0]
]
# Convert the input data to JSON format
payload = json.dumps(input_data)
# Invoke the SageMaker endpoint
response = client.invoke_endpoint(
EndpointName=endpoint_name,
ContentType=content_type,
Body=payload
)
# Get the response from the SageMaker endpoint
result = response['Body'].read().decode('utf-8')
print(result)
This code had an output to
[3, 0]
Everything is looking good. I go to my RDS cluster to verify I have an IAM Role for sagemaker. This role is there and Active for sagemaker.
Now I will run a simulate using that role to InvokeEndpoint on my SageMaker Endpoint. I selected the role and selected sagemaker and set the endpoint to the arn of the specific endpoint. Everything passes.
So now my database should be able to invoke it. In PGAdmin, I install the aws_ml extension and now have access to the aws_sagemaker.invoke_endpoint function.
When I run this query, I expect the results to be returned instead it times out.
SELECT aws_sagemaker.invoke_endpoint (
'Custom-sklearn-model-2024-03-26-20-15-12',
1,
1454.0, 1.0, 0.5, 1.0, 1.0, 0.0, 34.0, 0.7, 83.0, 4.0, 3.0, 250.0, 1033.0, 3419.0, 7.0, 5.0, 5.0, 1.0, 1.0, 0.0
);
and this is the error I get
'ERROR: invoke_endpoint failed with error message: "curlCode: 28, Timeout was reached"'
I checked cloudwatch logs for the endpoint and it's never reaching that. I can't figure out what is wrong with it. It seems like the database isn't using the role for sagemaker. In some tutorials with mysql, there seems to be a parameter "aws_default_sagemaker_role" in the the cluster parameter group. Unfortunately, that parameter isn't available to me and not sure if I am missing something to configure the default sagemaker role or not.
I need to modify the outgoing rules for the security-group that that database instance was using. I just opened all outgoing traffic for it to validate it and it seems to work.