Search code examples
azuremachine-learningendpointazure-batchscoring

Azure Batch Endpoint job ends with TypeError: the JSON object must be str, bytes or bytearray, not MiniBatch


I'm trying to run an Azure ML Batch endpoint job, but the job always ends with an error because of the input (see below). I used a model created and trained in the Azure designer as described on the page: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-model-designer?view=azureml-api-1

Error from directory "logs/azureml/stderrlogs.txt" is like:

TypeError: the JSON object must be str, bytes or bytearray, not MiniBatch

My scoring script (auto-generated for model):

import os
import json
from typing import List

from azureml.studio.core.io.model_directory import ModelDirectory
from pathlib import Path
from azureml.studio.modules.ml.score.score_generic_module.score_generic_module import ScoreModelModule
from azureml.designer.serving.dagengine.converter import create_dfd_from_dict
from collections import defaultdict
from azureml.designer.serving.dagengine.utils import decode_nan
from azureml.studio.common.datatable.data_table import DataTable


model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'trained_model_outputs')
schema_file_path = Path(model_path) / '_schema.json'
with open(schema_file_path) as fp:
    schema_data = json.load(fp)


def init():
    global model
    model = ModelDirectory.load(model_path).model


def run(data):
    data = json.loads(data)
    input_entry = defaultdict(list)
    for row in data:
        for key, val in row.items():
            input_entry[key].append(decode_nan(val))

    data_frame_directory = create_dfd_from_dict(input_entry, schema_data)
    score_module = ScoreModelModule()
    result, = score_module.run(
        learner=model,
        test_data=DataTable.from_dfd(data_frame_directory),
        append_or_result_only=True)
    return json.dumps({"result": result.data_frame.values.tolist()})

definition of input:

input = Input(type=AssetTypes.URI_FILE, path="azureml://subscriptions/$$$$$$$$/resourcegroups/$$$$$$$$$/workspaces/$$$$$/datastores/workspaceblobstore/paths/UI/2023-08-24_193934_UTC/samples.json")

definition of job:

job = ml_client.batch_endpoints.invoke(
   endpoint_name=endpoint.name,
   input=input,
)

I've read/watched various tutorials/documentation and tried solutions from them, but nothing helped and I've been stuck with this error for several hours, so I'm asking for help.


Solution

  • The batch endpoint expects a json file but for some reason Azure adds a hidden file ".amlignore" to the URI_FOLDER where the minibatches were imported from which azure couldn't process and therefore threw errors - see my folder content below input:

    "minibatch": ["**.amlignore**", "samples.json", "samples1.json"]