Search code examples
azure-batch

How to give a job manager task permissions to resize the pool?


I'm running embarrassingly parallel workloads, but the number of parallel tasks not known beforehand. Instead, my job manager task performs simple computation to determine the number of parallel tasks and then adds the tasks to the job.

Now, as soon as I know the number of parallel tasks I would like to immediately resize the pool I'm running in accordingly (I am running the job in an auto-pool). Here is how I try do this.

When I create the JobManagerTask I supply

...
  authentication_token_settings=AuthenticationTokenSettings(
      access=[AccessScope.job]),
...

At run time the task receives AZ_BATCH_AUTHENTICATION_TOKEN in environment, uses it to create BatchServiceClient, uses the client to add worker tasks to the job and ultimately calls client.pool.resize() to increase target_dedicated_nodes. At this stage the task gets an error from the service:

.../site-packages/azure/batch/operations/_pool_operations.py", line 1310, in resize
    raise models.BatchErrorException(self._deserialize, response)
azure.batch.models._models_py3.BatchErrorException: Request encountered an exception.
Code: PermissionDenied
Message: {'additional_properties': {}, 'lang': 'en-US', 'value': 'Server failed to authorize the request.\nRequestId:4b34d8e5-7c28-4af2-9e1f-9cf88a486511\nTime:2020-11-26T17:32:55.7673310Z'}
AuthenticationErrorDetail: The supplied authentication token does not have permission to call the requested Url.

How can I give the task permission to resize the pool?


Solution

  • Currently the AZ_BATCH_AUTHENTICATION_TOKEN is limited to permissions immediately with the job. The pool ends up being a separate resource even in the auto-pool configuration so is not modifiable with the token.

    There are two main approaches you can take. You can either add a certificate to your account and add it to your pool allowing you to authenticate with a ServicePrincipal with permissions to your account, or you could set your pool to autoscale depending on the number of pending tasks which doesn't get immediate resize options instead doing them at set intervals as needed.