Search code examples
amazon-web-servicesaws-glueamazon-sagemaker

When do I use a glue job or a Sagemaker Processing job for an etl?


I am currently struggling to decide on what situations in which a glue job is preferable over a sagemaker processing job and vice versa? Some advice on this topic would be greatly appreciated.

I can do the same on both, so why should I bother with the difference?


Solution

    • if you want to use a specific EC2 instance, use SageMaker
    • Pricing: SageMaker is pro-rated per-second while Glue has minimum charge amount (1min or 10min depending on versions). You should measure how much would a workload cost you on each platform
    • customization: in SageMaker Processing you can customize the execution environment, as you provide a Docker image (you could run more than Spark/Python, such as C++ or R)