Search code examples
azure-machine-learning-service

Azure ML: how to access logs of a failed Model deployment


I'm deploying a Keras model that is failing with the error below. The exception says that I can retrieve the logs by running "print(service.get_logs())", but that's giving me empty results. I am deploying the model from my AzureNotebook and I'm using the same "service" var to retrieve the logs.

Also, how can i retrieve the logs from the container instance? I'm deploying to an AKS compute cluster I created. Sadly, the docs link in the exception also doesnt detail how to retrieve these logs.

More information can be found using '.get_logs()' Error: 
{   "code":
"KubernetesDeploymentFailed",   "statusCode": 400,   "message":
"Kubernetes Deployment failed",   "details": [
    {
      "code": "CrashLoopBackOff",
      "message": "Your container application crashed. This may be caused by errors in your scoring file's init() function.\nPlease check
the logs for your container instance: my-model-service. From
the AML SDK, you can run print(service.get_logs()) if you have service
object to fetch the logs. \nYou can also try to run image
mlwks.azurecr.io/azureml/azureml_3c0c34b65cf18c8644e8d745943ab7d2:latest
locally. Please refer to http://aka.ms/debugimage#service-launch-fails
for more information."
    }   ] }

UPDATE

Here's my code to deploy the model:

environment = Environment('my-environment')
environment.python.conda_dependencies = CondaDependencies.create(pip_packages=["azureml-defaults","azureml-dataprep[pandas,fuse]","tensorflow", "keras", "matplotlib"])
service_name = 'my-model-service'

# Remove any existing service under the same name.
try:
    Webservice(ws, service_name).delete()
except WebserviceException:
    pass

inference_config = InferenceConfig(entry_script='score.py', environment=environment)
comp = ComputeTarget(workspace=ws, name="ml-inference-dev")
service = Model.deploy(workspace=ws,
                       name=service_name,
                       models=[model],
                       inference_config=inference_config,
                       deployment_target=comp 
                      )
service.wait_for_deployment(show_output=True)

And my score.py

import joblib
import numpy as np
import os

import keras

from keras.models import load_model
from inference_schema.schema_decorators import input_schema, output_schema
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType


def init():
    global model

    model_path = Model.get_model_path('model.h5')
    model = load_model(model_path)
    model = keras.models.load_model(model_path)


# The run() method is called each time a request is made to the scoring API.
#
# Shown here are the optional input_schema and output_schema decorators
# from the inference-schema pip package. Using these decorators on your
# run() method parses and validates the incoming payload against
# the example input you provide here. This will also generate a Swagger
# API document for your web service.
@input_schema('data', NumpyParameterType(np.array([[0.1, 1.2, 2.3, 3.4, 4.5, 5.6, 6.7, 7.8, 8.9, 9.0]])))
@output_schema(NumpyParameterType(np.array([4429.929236457418])))
def run(data):

    return [123] #test

Update 2:

Here is a screencap of the endpoint page. Is it normal for the CPU to be .1? Also, when i hit the swagger url in the browser, i get the error: "No ready replicas for service doc-classify-env-service"

enter image description here

Update 3 After finally getting to the container logs, it turns out that it was choking with this error on my score.py

ModuleNotFoundError: No module named 'inference_schema'

I then ran a test that commented out the refs for "input_schema" and "output_schema" and also simplified my pip_packages and the REST endpoint come up! I was also able to get a prediction out of the model.

pip_packages=["azureml-defaults","tensorflow", "keras"])

So my question is, how should I have my pip_packages for the scoring file to utilize the inference_schema decorators? I'm assuming I need to include azureml-sdk[auotml] pip package, but when i do so, the image creation fails and I see several dependency conflicts.


Solution

  • Try retrieving your service from the workspace directly

    ws.webservices[service_name].get_logs()
    

    Also, I found deploying an image as an endpoint to be easier than inference+deploy model (depending on your use case)

    my_image = Image(ws, name='test', version='26')  
    service = AksWebservice.deploy_from_image(ws, "test1", my_image, deployment_config, aks_target)