I have a batch job that runs successfully when the memory requirement is <15GB but gets stuck in runnable when more memory is required.
Batch troubleshooting says that this might happen because of insufficient resoureces:
Jobs Stuck in RUNNABLE Status
Insufficient resources
If your job definitions specify more CPU or memory resources than your compute resources can allocate, then your jobs will never be placed. For example, if your job specifies 4 GiB of memory, and your compute resources have less than that, then the job cannot be placed on those compute resources. In this case, you must reduce the specified memory in your job definition or add larger compute resources to your environment.
However, the ComputeResorces
InstanceTypes
is set to optimal
and the batch appears to select different instance types (e.g. r4.large
) based on changes to memory requirements. So I do not understand why batch is unable to select an appropriate resource with sufficient memory.
The jobs eventually moved out of Runnable
and completed successfully (the largest job that ran used 64GB). So it looks like the Compute Resources were able to be set up properly with InstanceType
of optimal
.