Search code examples
amazon-web-serviceshadoop-yarnhadoop2emramazon-emr

Running multiple hadoop jobs in parallel in yarn


When I try to run multiple hadoop jobs in EMR cluster, they all run one after the other (I can see the progress using yarn application -list).

  1. Is there a way to run all these hadoop jobs in parallel?
  2. Will passing multiple hadoop jobs in a single step solve this issue? If yes, How to pass multiple jobs within a single step?

Solution

  • If you use the HadoopActivity with either the FAIR scheduler or capacity scheduler, you can run multiple steps in parallel.

    https://aws.amazon.com/about-aws/whats-new/2015/06/run-parallel-hadoop-jobs-on-your-amazon-emr-cluster-using-aws-data-pipeline/