We queue a lot of tasks in Azure Batch and have 8 nodes in our pool to process the tasks. We now see strange behaviour (since 2 days ago).
The node now remains idle even though we have 1000+ tasks queued waiting to be processed by the pool.
Rebooting the node, brings it into an error state and then it will start up again, process several tasks and then stops picking up new tasks again.
What I've checked:
For visual reference:
Is this a bug in the Azure Batch Scheduling? (since we haven't made any changes recently)
If not a bug, how can we get more info about what's happening with these nodes during scheduling?
I opened up a support ticket with Microsoft. It turned out to be a scheduling bug which has now been fixed and all is working again