We want to tune a SageMaker PipelineModel with a HyperparameterTuner (or something similar) where several components of the pipeline have associated hyperparameters. Both components in our case are realized via SageMaker containers for ML algorithms.
model = PipelineModel(..., models = [ our_model, xgb_model ])
deploy = Estimator(image_uri = model, ...)
...
tuner = HyperparameterTuner(deply, .... tune_parameters, ....)
tuner.fit(...)
Now, there is of course the problem how to distribute the tune_parameters
to the pipeline steps during the tuning.
In scikit-learn this is achieved by specially naming the tuning parameters <StepName>__<ParameterName>
.
I don't see a way to achieve something similar with SageMaker, though. Also, search of the two keywords brings up the same question here but is not really what we want to do.
Any suggestion how to achieve this?
You can use an entry_point script to define a pipeline model and then pass the estimator to HyperparameterTuner. Note that your entry_point script (in this case, main.py) must print out the objective metric in the exact format so that HyperparameterTuner can correctly parse and select the best model.
# Create an estimator (Column Transformer + XGBoost Classifier)
estimator = Estimator(
image_uri=get_xgboost_image_uri(region=region, version="1.7-1"),
role=role,
instance_count=config["model_training"]["instance_count"],
instance_type=config["model_training"]["instance_type"],
output_path=f"s3://{default_bucket}/{base_job_prefix}/model",
hyperparameters=fixed_hyperparameters,
sagemaker_session=pipeline_session,
entry_point=os.path.join("steps", "model_training", "main.py"),
)
# Define Hyperparameter Tuning Job
tuner = HyperparameterTuner(
estimator=estimator,
objective_metric_name=config["model_training"]["objective_metric_name"],
hyperparameter_ranges=hyperparameter_ranges,
max_jobs=compute_max_parallel_jobs(hyperparameter_ranges),
max_parallel_jobs=compute_max_parallel_jobs(hyperparameter_ranges),
objective_type=config["model_training"]["objective_type"],
metric_definitions=[
{
'Name': 'validation:auc',
'Regex': 'validation:auc=([0-9\\.]+)'
}
]
)