Search code examples
azure-machine-learning-service

How to get insights in exceptions and logging of AzureML endpoint deployment


Because of a faulty score.py file in my InferenceConfig, a Model.Deploy failed to Azure Machine Learning, using ACI. I wanted to create the endpoint in the cloud, but the only state I can see in the portal is Unhealthy. My local script to deploy the model (using ) keeps running, until it times out. (using the service.wait_for_deployment(show_output=True)statement).

Is there an option to get more insights in the actual reason/error message of the deployment turning "Unhealthy"?


Solution

  • Usually the timeout is caused by an error in init() function in scoring script. You can get the detailed logs using print(service.get_logs()) to find the Python error.

    For more comprehensive troubleshooting guide, see:

    https://learn.microsoft.com/en-us/azure/machine-learning/how-to-troubleshoot-deployment