Search code examples
amazon-web-servicesairflowaws-batchmwaa

AWS Managed Apache airflow (MWAA) or AWS Batch for simple batch job flow


I have simple workflow to design where there will be 4 batch job running one after another sequentially and each jobs is running in multi node master/slave architecture.

My question is AWS Batch can manage simple workflow using job queue and can manage multi-node parallel job as well. Now, should I use AWS Batch or Airflow ?

With Airflow , I can use KubernetesPodOperator and job will run in Kubernetes cluster. But Airflow does not inherently support multi node parallel jobs.

Note: The batch job is written in java using Spring batch remote partitioning framework that support master/slave architecture.


Solution

  • AWS Batch would fit your requirements better.

    Airflow is a workflow orchestration tool, it's used to host many jobs that have multiple tasks each, with each task being light on processing. Its most common use is for ETL, but in your use case you would have an entire Airflow ecosystem for just a single job, which (unless you manually broke it out to smaller tasks) would not run multi-threaded.

    AWS Batch on the other hand is for batch processing, and you can more finely-tune the servers/nodes that you want your code to execute on. I think in your use case it would also work out cheaper than Airflow too.