Search code examples
azure-machine-learning-serviceazureml-python-sdkazuremlsdk

AzureMLCompute job failed with `FailedLoginToImageRegistry`


I've been trying to send a train job through azure ml python sdk with:

from azureml.core import Workspace, Experiment, ScriptRunConfig 

if __name__ == "__main__":
    ws = Workspace.from_config()
    experiment = Experiment(workspace=ws, name='ConstructionTopicsModel')

    config = ScriptRunConfig(source_directory='./',
                         script='src/azureml/train.py',
                         arguments=None,
                         compute_target='ComputeTargetName',
                         )

    env = ws.environments['test-env']
    config.run_config.environment = env
    run = experiment.submit(config)
    
    run.wait_for_completion(show_output=True)

    aml_url = run.get_portal_url()
    print(aml_url)

But I was getting the ServiceError message:

AzureMLCompute job failed. FailedLoginToImageRegistry: Unable to login to docker image repo
Reason: Failed to login to the docker registry
error: WARNING! Using --password via the CLI is insecure. Use --password-stdin. Error saving credentials: error storing credentials - err: exit status 1, out: `Cannot autolaunch D-Bus without X11 $DISPLAY`

serviceURL: 7ac86b04d6564d36aa80ae2ad090582c.azurecr.io
Reason: WARNING! Using --password via the CLI is insecure. Use --password-stdin. Error saving credentials: error storing credentials - err: exit status 1, out: `Cannot autolaunch D-Bus without X11 $DISPLAY`

Info: Failed to setup runtime for job execution: Job environment preparation failed on 10.0.0.5 with err exit status 1.

I also tried using the azure cli without success, same error message


Solution

  • The only way I've found so far to make this work, was to run it on a terminal of the compute-target itself. That's how the docker error goes away. Trying to run the experiment from a terminal of a different compute instance raises the exception.