Search code examples
pytorchcomputer-visionazure-pipelinesuriazure-machine-learning-service

When creating a component in Azure ML studio, the component is not able to save files to output folder URI. The component transforms images and saves


In Azure ML studio, we can build components to do different tasks in machine learning. I am creating a component that has one input: Input image folder (URI) and two output folders (URIs). The component takes images from input folder, transforms images using Pytorch and tries to save it to output folder.I am getting following error after executing command component from a yaml file.

Execution failed. User process 'python' exited with status code 1. Please check log file 'user_logs/std_log.txt' for error details.

Error: Traceback (most recent call last): File "prep_1.py", line 177, in main(args) File "prep_1.py", line 169, in main prepare_data_component(args.input_data, args.training_data, args.val_data) File "prep_1.py", line 114, in prepare_data_component image.save(save_path) File "/azureml-envs/azureml_7e9e1abac3aeb5e2560b92cd769d118a/lib/python3.7/site-packages/PIL/Image.py", line 2428, in save fp = builtins.open(filename, "w+b") OSError: [Errno 30] Read-only file system: '/mnt/azureml/cr/j/50e4xxxxxxxx25a02xxxxxx/cap/data-capability/wd/INPUT_input_data/train/chickens/trial.jpg'

I want to know how to write/save images to an output URI from a .py file executed as a command from yaml file


Solution

  • I am creating Azure ML components from a yaml file. Here's how a simple yaml file to create a component will look like :

    prep.yaml

    $schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
    type: command
    
    name: prep_image_classification_pytorch
    display_name: Data Preparation for Image Classification Pytorch
    inputs:
      input_data: 
        type: uri_folder
    outputs:
      training_data:
        type: uri_folder
      val_data:
        type: uri_folder 
    code: ./
    command: python prep.py --input_data ${{inputs.input_data}} --training_data ${{outputs.training_data}} --val_data ${{outputs.val_data}}
    environment:
      conda_file: ./conda.yaml
      image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04

    The command type will run the source python file which is prep.py

    The prep.py file will take arguments from given yaml file (command line arguments) which we will parse through argparse library

    prep.py

    ef parse_args():
        # setup argparse
        parser = argparse.ArgumentParser()
    
        # add arguments
        parser.add_argument("--input_data", type=str, help="path of input data")
        parser.add_argument("--training_data", type=str, default="./", help="output path of train data")
        parser.add_argument("--val_data", type=str, default="./", help="output path of validation data")
    
        # parse args
        args = parser.parse_args()
        # return args
        return args
        
    def prepare_data_component(
        input_data: Input(type="uri_folder"),
        training_data: Output(type="uri_folder"),
        val_data: Output(type="uri_folder") ):

    As shown in code above use "./default" for ouput URI folders. These folders will be created inside Azure MLs blob storage. Mentioning path of our choice inside azure blob storage did not work for me.