Search code examples
amazon-web-servicesamazon-sagemakerautoml

How to explicitly set sagemaker autopilot's validation set?


The example notebook: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/autopilot/autopilot_customer_churn.ipynb states that in the Analyzing Data step:

The dataset is analyzed and Autopilot comes up with a list of ML pipelines that should be tried out on the dataset. The dataset is also split into train and validation sets.

Presumably, autopilot uses this validation set to select the best performing model candidates to return to the user. However, I have not found a way to manually set this validation set used by sagemaker autopilot.

For example, google automl, allows users to add TRAIN, VALIDATE,TEST keywords to a data_split column to manually set which data points are in which set.

Is something like this currently possible which sagemaker autopilot?


Solution

  • I'm afraid you can't do this at the moment. The validation set is indeed built by Autopilot itself.