Search code examples
amazon-emraws-glue

Scheduling over different AWS Components - Glue and EMR


I was wondering how I would tackle the following on AWS? - or whether it was not possible?

  • Transient EMR Cluster for some bulk Spark processing
  • When that cluster terminates, then and only then use a Glue Job to do some limited processing

I am not convinced AWS Glue Triggers will help over environments.

Or could one say, well just keep on in the EMR Cluster, it's not a good use case? Glue can write to SAP Hana with appropriate Connector and Redshift Spectrum is common use case to load Redshift via Glue job with Redshift Spectrum.


Solution

  • You can use "Run a job" service integration using AWS Step Functions. Step functions supports both EMR and Glue integration. Please refer to the link for details.