Search code examples
azuremachine-learningyamlautoml

Azure AutoML Image Classification Job


I’m facing an issue while trying to create an MLTable YAML file for a dataset in Azure ML.

I have a default datastore in my workspace containing two folders (OK and NOK) with images. My goal is to read all images and use the folder name as the label for each image.

Here’s what I’ve tried so far:

mltable_yaml = """
type: mltable
paths:
  - file: ./OK  
  - file: ./NOK 
transformations:
  - read_from_directory:
      image_column: image_url  
      folder_column: label  
      recursive: true         
"""

# Create directory and save MLTable
mltable_dir = "image_data"
os.makedirs(mltable_dir, exist_ok=True)
with open(os.path.join(mltable_dir, "MLTable"), "w") as f:
    f.write(mltable_yaml)

training_data = Input(
    type="mltable",
    path=mltable_dir
)

However, when I run the experiment, I encounter the following error:

MLTable input is invalid. UserErrorException:
    Message: Encountered user error while fetching data from Dataset. Error: UserErrorException:
    Message: MLTable yaml schema is invalid: 
Error Code: ScriptExecution.Validation
Validation Error Code: Invalid
Validation Target: Script
Native error: Dataflow script error: InvalidScriptElement("read_from_directory")
    ScriptError(InvalidScriptElement("read_from_directory"))
=> Invalid script element "read_from_directory"
    InvalidScriptElement("read_from_directory")
Error Message: Yaml script is invalid: InvalidScriptElement("read_from_directory").| session_id=1a30b15a-7e85-498b-b735-2348bfe0625b
    InnerException None
    ErrorResponse 
{
    "error": {
        "code": "UserError",
        "message": "MLTable yaml schema is invalid: \nError Code: ScriptExecution.Validation\nValidation Error Code: Invalid\nValidation Target: Script\nNative error: Dataflow script error: InvalidScriptElement(\"read_from_directory\")\n\tScriptError(InvalidScriptElement(\"read_from_directory\"))\n=> Invalid script element \"read_from_directory\"\n\tInvalidScriptElement(\"read_from_directory\")\nError Message: Yaml script is invalid: InvalidScriptElement(\"read_from_directory\").| session_id=1a30b15a-7e85-498b-b735-2348bfe0625b"
    }
}
    InnerException UserErrorException:
    Message: MLTable yaml schema is invalid: 
Error Code: ScriptExecution.Validation
Validation Error Code: Invalid
Validation Target: Script
Native error: Dataflow script error: InvalidScriptElement("read_from_directory")
    ScriptError(InvalidScriptElement("read_from_directory"))
=> Invalid script element "read_from_directory"
    InvalidScriptElement("read_from_directory")
Error Message: Yaml script is invalid: InvalidScriptElement("read_from_directory").| session_id=1a30b15a-7e85-498b-b735-2348bfe0625b
    InnerException None
    ErrorResponse 
{
    "error": {
        "code": "UserError",
        "message": "MLTable yaml schema is invalid: \nError Code: ScriptExecution.Validation\nValidation Error Code: Invalid\nValidation Target: Script\nNative error: Dataflow script error: InvalidScriptElement(\"read_from_directory\")\n\tScriptError(InvalidScriptElement(\"read_from_directory\"))\n=> Invalid script element \"read_from_directory\"\n\tInvalidScriptElement(\"read_from_directory\")\nError Message: Yaml script is invalid: InvalidScriptElement(\"read_from_directory\").| session_id=1a30b15a-7e85-498b-b735-2348bfe0625b"
    }
}
    ErrorResponse 
{
    "error": {
        "code": "UserError",
        "message": "Encountered user error while fetching data from Dataset. Error: UserErrorException:\n\tMessage: MLTable yaml schema is invalid: \nError Code: ScriptExecution.Validation\nValidation Error Code: Invalid\nValidation Target: Script\nNative error: Dataflow script error: InvalidScriptElement(\"read_from_directory\")\n\tScriptError(InvalidScriptElement(\"read_from_directory\"))\n=> Invalid script element \"read_from_directory\"\n\tInvalidScriptElement(\"read_from_directory\")\nError M

From the error details, it seems like the read_from_directory element is not recognized, but I’m unsure how to structure the YAML to correctly map the folder name to the label.

How to resolve this?


Solution

  • There is no read_from_directory tranformantion schema in MLTable, check this documentation.

    For AutoML image classification you need data in .jsonl file with below fields, check this documentation/

    {
       "image_url":"azureml://subscriptions/<my-subscription-id>/resourcegroups/<my-resource-group>/workspaces/<my-workspace>/datastores/<my-datastore>/paths/<path_to_image>",
       "image_details":{
          "format":"image_format",
          "width":"image_width",
          "height":"image_height"
       },
       "label":"class_name",
    }
    

    image_url and label are required fields, also you need to give image url as complete datastore path.

    Follow below steps to create jsonl file.

    First, you need datastore path to each image so you create new data asset and take the path.

    from azure.ai.ml.entities import Data
    from azure.ai.ml.constants import AssetTypes, InputOutputModes
    from azure.ai.ml import Input
    
    from azure.identity import DefaultAzureCredential
    from azure.ai.ml import MLClient
    
    credential = DefaultAzureCredential()
    
    ml_client = MLClient.from_config(credential)
    my_data = Data(
        path="./images",
        type=AssetTypes.URI_FOLDER,
        description="Fridge-items images",
        name="items-images",
    )
    
    uri_folder_data_asset = ml_client.data.create_or_update(my_data)
    

    Here, i am having OK and NOK folders inside images.

    enter image description here

    You will get path in uri_folder_data_asset.path.

    Next create jsonl file using below code.

    import os
    import json
    
    folders = {
        "OK": "./images/OK",
        "NOK": "./images/NOK"
    }
    
    mltable_dir = "image_data_mltable"
    os.makedirs(mltable_dir, exist_ok=True)
    
    output_file = "./image_data_mltable/image_data.jsonl"
    
    with open(output_file, "w") as jsonl_file:
        for label, folder_path in folders.items():
            for file_name in os.listdir(folder_path):
                if file_name.lower().endswith((".jpg", ".jpeg", ".png", ".bmp", ".gif")):
                    record = {
                        "image_url": os.path.join(folder_path.replace('./images/',uri_folder_data_asset.path), file_name).replace("\\", "/"),
                        "label": label
                    }
                    jsonl_file.write(json.dumps(record) + "\n")
    
    print(f"JSONL file created: {output_file}")
    

    and create mltable file.

    mltable_yaml = """
    paths:
      - file: ./image_data.jsonl
    transformations:
      - read_json_lines:
            encoding: utf8
            invalid_lines: error
            include_path_column: false
      - convert_column_types:
          - columns: image_url
            column_type: stream_info       
    """
    
    with open(os.path.join(mltable_dir, "MLTable"), "w") as f:
        f.write(mltable_yaml)
    

    Use read_json_lines in transformation, check this on how to prepare image data.

    Output:

    enter image description here

    Now use it as input.

    import mltable
    
    training_data  = Input(type=AssetTypes.MLTABLE, path="./image_data_mltable")
    
    tbl = mltable.load(uri="./image_data_mltable")
    tbl.to_pandas_dataframe()
    

    You refer this sample github documentation for AutoML classification to know more about it.