Search code examples
amazon-web-servicesamazon-sagemakeramazon-machine-learningaws-pipeline

AWS Sagemaker: Can I pass a sagemaker.workflow.parameters.ParameterString to an SKLearnProcessor


I am working on creating a Sagemaker pipeline. In the evaluation step, I would like to pass an argument to my preprocess.py script.

There are a few examples online of how to do so (a sample below) but they all use static values. I want to pass a Workflow parameter (string in this case) to the script.

I tried multiple approaches but to no avail, and I even opened a Github Issue but received no response so far.

The linked Github Issue details all approaches I've taken so far, but it all boils down to the fact that a workflow parameter is only evaluated at runtime.

I would like to know if what I want to do is possible or not.

Option1: Typical approach: Passing Static values

sklearn_processor.run(
    code="preprocess.py",
    inputs = [
        ProcessingInput(source = 'my_package/', destination = '/opt/ml/processing/input/code/my_package/')
    ],
    outputs=[
        ProcessingOutput(output_name="test_transform_data", 
                         source = '/opt/ml/processing/output/test_transform',
                         destination = out_path),
    ],
    arguments=["--time-slot-minutes", "30min"]
)

source for the sample code: How to pass region to the SKLearnProcessor - botocore.exceptions.NoRegionError: You must specify a region

Option2: My approach: Passing Workflow Parameter

   step_args=myprocessor.run(
       inputs=[
            ProcessingInput(source=s3_full_address, destination="/opt/ml/processing/input"),
       ],
       outputs=[
        ProcessingOutput(output_name="raw", source="/opt/ml/processing/train"),
        ProcessingOutput(output_name="test", source="/opt/ml/processing/test"),
       ],
       code="generate_train_test_data.py",
       arguments=["--s3_prefix", s3_prefix]
   )

Where s3_prefix is a workflow argument defined as s3_prefix = ParameterString(name="InputPrefix", default_value="myprefix")


Solution

  • To pass a workflow argument to your script you can use the option job_arguments

    1. Step defintion

    Update your step definition to add the argument job_arguments

    ProcessingStep(
        name="step-name",
        processor=my_processor,
        job_arguments=[
            "--my_argument",my_argument
        ],
        ...
        code=f"myscript.py"
    )
    

    2. Reading the argument

    In your script (myscript.py in this example), add ready the argument as follows:

    def parse_args():
        parser = argparse.ArgumentParser()
    
        # hyperparameters sent by the client are passed as command-line arguments to the script
        parser.add_argument('--my_argument', type=str)
    
        return parser.parse_known_args()
        args, _ = parse_args()
    
    args, _ = parse_args()    
    my_argument = args.my_argument