pytorch computer-vision azure-pipelines uri azure-machine-learning-service

When creating a component in Azure ML studio, the component is not able to save files to output folder URI. The component transforms images and saves

In Azure ML studio, we can build components to do different tasks in machine learning. I am creating a component that has one input: Input image folder (URI) and two output folders (URIs). The component takes images from input folder, transforms images using Pytorch and tries to save it to output folder.I am getting following error after executing command component from a yaml file.

Execution failed. User process 'python' exited with status code 1. Please check log file 'user_logs/std_log.txt' for error details.

Error: Traceback (most recent call last): File "prep_1.py", line 177, in main(args) File "prep_1.py", line 169, in main prepare_data_component(args.input_data, args.training_data, args.val_data) File "prep_1.py", line 114, in prepare_data_component image.save(save_path) File "/azureml-envs/azureml_7e9e1abac3aeb5e2560b92cd769d118a/lib/python3.7/site-packages/PIL/Image.py", line 2428, in save fp = builtins.open(filename, "w+b") OSError: [Errno 30] Read-only file system: '/mnt/azureml/cr/j/50e4xxxxxxxx25a02xxxxxx/cap/data-capability/wd/INPUT_input_data/train/chickens/trial.jpg'

I want to know how to write/save images to an output URI from a .py file executed as a command from yaml file

Solution

I am creating Azure ML components from a yaml file. Here's how a simple yaml file to create a component will look like :

prep.yaml

$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
type: command

name: prep_image_classification_pytorch
display_name: Data Preparation for Image Classification Pytorch
inputs:
  input_data: 
    type: uri_folder
outputs:
  training_data:
    type: uri_folder
  val_data:
    type: uri_folder 
code: ./
command: python prep.py --input_data ${{inputs.input_data}} --training_data ${{outputs.training_data}} --val_data ${{outputs.val_data}}
environment:
  conda_file: ./conda.yaml
  image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04

The command type will run the source python file which is prep.py

The prep.py file will take arguments from given yaml file (command line arguments) which we will parse through argparse library

prep.py

ef parse_args():
    # setup argparse
    parser = argparse.ArgumentParser()

    # add arguments
    parser.add_argument("--input_data", type=str, help="path of input data")
    parser.add_argument("--training_data", type=str, default="./", help="output path of train data")
    parser.add_argument("--val_data", type=str, default="./", help="output path of validation data")

    # parse args
    args = parser.parse_args()
    # return args
    return args
    
def prepare_data_component(
    input_data: Input(type="uri_folder"),
    training_data: Output(type="uri_folder"),
    val_data: Output(type="uri_folder") ):

As shown in code above use "./default" for ouput URI folders. These folders will be created inside Azure MLs blob storage. Mentioning path of our choice inside azure blob storage did not work for me.