Scenario: I have multiple tasks running DL
models on the same dataset
. It is becoming wasteful downloading the same dataset
in each task, so looking for methods that allows to persist the downloaded data across different task runs that require the same dataset
.
I explored ResourceFiles and ApplicationPackages, however as per my understanding they do not suite my requirement because of the following
As per docker volume capabilities, I can run my tasks with the same volume ID and the data downloaded will persist in the VM. Since Azure batch does not directly expose the "docker run" command for running the container, is there any other way to specify using volumes for the batch tasks using the python SDK?
Can we use the "container_run_options" of TaskContainerSettings to mention docker volumes?
Edit
I tried specifying volume in TaskContainerSettings, but while trying to write to the mounted path, I am getting permission denied error
PermissionError: [Errno 13] Permission denied: '/opt/docker/Gy9EKVB728YcVZgn7e2AVuuQ/00000001.jpg'
Found a way to use docker volumes.
First: Use the "container_run_options" of TaskContainerSettings to mention docker volumes.
task_container_settings = batch.models.TaskContainerSettings(
image_name=image_name,
container_run_options=f"-v {<volume_id>}:{<path>}"
)
This will mount a volume in /mnt/docker/volumes with the name <volume_id> and be accessible in the container in the mentioned .
Second: Run task in pool scope and elevated admin privileges. Without this, you will get permission error while trying to write to the mount volume path in the container.
task = batch.models.TaskAddParameter(
id=task_id,
command_line=command,
container_settings=task_container_settings,
user_identity=batchmodels.UserIdentity(
auto_user=batchmodels.AutoUserSpecification(
scope=batchmodels.AutoUserScope.pool,
elevation_level=batchmodels.ElevationLevel.admin)
)
)
This will run the task in root privileges so that the container spun by the task has access to the mounted volume.