Search code examples
azuredockerpipcondaazure-machine-learning-service

How to reuse successfully built docker images in Azure ML?


In our company I use Azure ML and I have the following issue. I specify a conda_requirements.yaml file with the PyTorch estimator class, like so (... are placeholders so that I do not have to type everything out):

from azureml.train.dnn import PyTorch
est = PyTorch(source_directory=’.’, script_params=..., compute_target=..., entry_script=..., conda_dependencies_file_path=’conda_requirements.yaml’, environment_variables=..., framework_version=’1.1’)

The conda_requirements.yaml (shortened version of the pip part) looks like this:

dependencies:
  -  conda=4.5.11
  -  conda-package-handling=1.3.10
  -  python=3.6.2
  -  cython=0.29.10
  -  scikit-learn==0.21.2
  -  anaconda::cloudpickle==1.2.1
  -  anaconda::cffi==1.12.3
  -  anaconda::mxnet=1.1.0
  -  anaconda::psutil==5.6.3
  -  anaconda::pip=19.1.1
  -  anaconda::six==1.12.0
  -  anaconda::mkl==2019.4
  -  conda-forge::openmpi=3.1.2
  -  conda-forge::pycparser==2.19
  -  tensorboard==1.13.1
  -  tensorflow==1.13.1
  -  pip:
        - torch==1.1.0
        - torchvision==0.2.1

This successfully builds on Azure. Now in order to reuse the resulting docker image in that case, I use the custom_docker_image parameter to pass to the

from azureml.train.estimator import Estimator
est = Estimator(source_directory=’.’, script_params=..., compute_target=..., entry_script=..., custom_docker_image=’<container registry name>.azurecr.io/azureml/azureml_c3a4f...’, environment_variables=...)

But now Azure somehow seems to rebuild the image again and when I run the experiment it cannot install torch. So it seems to only install the conda dependencies and not the pip dependencies, but actually I do not want Azure to rebuild the image. Can I solve this somehow?

I attempted to somehow build a docker image from my Docker file and then add to the registry. I can do az login and according to https://learn.microsoft.com/en-us/azure/container-registry/container-registry-authentication I then should also be able to do an acr login and push. This does not work. Even using the credentials from

az acr credential show –name <container registry name>

and then doing a

docker login <container registry name>.azurecr.io –u <username from credentials above> -p <password from credentials above>

does not work. The error message is authentication required even though I used

az login

successfully. Would also be happy if someone could explain that to me in addition to how to reuse docker images when using Azure ML. Thank you!


Solution

  • AzureML should actually cache your docker image once it was created. The service will hash the base docker info and the contents of the conda.yaml file and will use that as the hash key -- unless you change any of that information, the docker should come from the ACR.

    As for the custom docker usage, did you set the parameter user_managed=True? Otherwise, AzureML will consider your docker to be a base image on top of which it will create the conda environment per your yaml file.
    There is an example of how to use a custom docker image in this notebook: https://github.com/Azure/MachineLearningNotebooks/blob/4170a394edd36413edebdbab347afb0d833c94ee/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/how-to-use-estimator.ipynb