I have 3 main process to perform using Amazon SageMaker.
-> For this, I have referred to this link:
Bring own algorithm to AWS sagemaker
and it seems that we can bring our own training script to sagemaker managed training setup, and model artifacts can be uploaded to s3 etc.
Note: I am using Light GBM model for training.
-> There is no need to deploy model and create endpoint, because training will happen everyday, and will create forecast as soon as training completes. (Need to generate forecast in train.py itself)
-> Challenge is how can I write forecast in AWS RDS DB from train.py script. (Given that script is running in Private VPC)
--> I have gone through AWS step functions and seems to be the way to trigger daily training and write forecast to RDS.
--> Challenge is how to use step function for time based trigger and not event based.
Any suggestions on how to do this? Any best practices to follow? Thank you in advance.
The way to trigger Step Functions on schedule is by using CloudWatch Events (sort of cron). Check out this tutorial: https://docs.aws.amazon.com/step-functions/latest/dg/tutorial-cloudwatch-events-target.html
Don't write to the RDS from your Python code! It is better to write the output to S3 and then "copy" the files from S3 into the RDS. Decoupling these batches will make a more reliable and scalable process. You can trigger the bulk copy into the RDS when the files are written to S3 or to a later time when your DB is not too busy.