Search code examples
apache-flink

Flink: fail fast if job parallelism is larger than the total number of slots


The Flink doc says:

A Flink cluster needs exactly as many task slots as the highest parallelism used in the job.

But when I run the WordCount example job with job parallelism=4 and 2 slots (2 TM * 1), what I observed was that the Dispatcher still accepted the job and finished some tasks, but a few minutes later this error happened:

org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Could not allocate enough slots within timeout of 300000 ms to run the job. Please make sure that the cluster has enough resources.

Is there a way I can configure my job to fail fast if parallelism is larger than the total number of slots?


Solution

  • The flink job manager will try to find resource across your cluster.

    You need to give the Job manager some time, obviously it`s depend on your cluster size and network.

    You can change the slot.request.timeout in order to get the exception quicker.