Search code examples
azure-machine-learning-serviceazure-aks

AML - Web service TimeoutError


We created a webservice endpoint and tested it with the following code, and also with POSTMAN.

We deployed the service to an AKS in the same resource group and subscription as the AML resource.

UPDATE: the attached AKS had a custom networking configuration and rejected external connections.

import numpy
import os, json, datetime, sys
from operator import attrgetter
from azureml.core import Workspace
from azureml.core.model import Model
from azureml.core.image import Image
from azureml.core.webservice import Webservice
from azureml.core.authentication import AzureCliAuthentication

cli_auth = AzureCliAuthentication()
# Get workspace
ws = Workspace.from_config(auth=cli_auth)

# Get the AKS Details
try:
    with open("../aml_config/aks_webservice.json") as f:
        config = json.load(f)
except:
    print("No new model, thus no deployment on AKS")
    # raise Exception('No new model to register as production model perform better')
    sys.exit(0)

service_name = config["aks_service_name"]
# Get the hosted web service
service = Webservice(workspace=ws, name=service_name)

# Input for Model with all features
input_j = [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]]
print(input_j)
test_sample = json.dumps({"data": input_j})
test_sample = bytes(test_sample, encoding="utf8")
try:
    prediction = service.run(input_data=test_sample)
    print(prediction)
except Exception as e:
    result = str(e)
    print(result)
    raise Exception("AKS service is not working as expected")

In AML Studio, the deployment state is "Healthy".

Endpoint attributes

We get the following error when testing:

Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'

Log just after deploying the AKS Webservice here.

Log after running the test script here.

How can we know what is causing this problem and fix it?


Solution

  • We checked the AKS networking configuration and realized it has an Azure CNI profile.

    In order to test the webservice we need to do it from inside the created virtual network. It worked well!