Search code examples
pythonxgboostamazon-sagemaker

entry_point file using XGBoost as a framework in sagemaker


Looking at the following source code taken from here (SDK v2):

import boto3
import sagemaker
from sagemaker.xgboost.estimator import XGBoost
from sagemaker.session import Session
from sagemaker.inputs import TrainingInput

# initialize hyperparameters
hyperparameters = {
        "max_depth":"5",
        "eta":"0.2",
        "gamma":"4",
        "min_child_weight":"6",
        "subsample":"0.7",
        "verbosity":"1",
        "objective":"reg:linear",
        "num_round":"50"}

# set an output path where the trained model will be saved
bucket = sagemaker.Session().default_bucket()
prefix = 'DEMO-xgboost-as-a-framework'
output_path = 's3://{}/{}/{}/output'.format(bucket, prefix, 'abalone-xgb-framework')

# construct a SageMaker XGBoost estimator
# specify the entry_point to your xgboost training script
estimator = XGBoost(entry_point = "your_xgboost_abalone_script.py", 
                    framework_version='1.2-2',
                    hyperparameters=hyperparameters,
                    role=sagemaker.get_execution_role(),
                    instance_count=1,
                    instance_type='ml.m5.2xlarge',
                    output_path=output_path)

# define the data type and paths to the training and validation datasets
content_type = "libsvm"
train_input = TrainingInput("s3://{}/{}/{}/".format(bucket, prefix, 'train'), content_type=content_type)
validation_input = TrainingInput("s3://{}/{}/{}/".format(bucket, prefix, 'validation'), content_type=content_type)

# execute the XGBoost training job
estimator.fit({'train': train_input, 'validation': validation_input})

I wonder where the your_xgboost_abalone_script.py file has to be placed please? So far I used XGBoost as a built-in algorithm from my local machine with similar code (i.e. I span up a training job remotely). Thanks!

PS:

Looking at this, and source_dir, I wonder if one can upload Python files to S3. In this case, I take it is has to be tar.gz? Thanks!


Solution

  • your_xgboost_abalone_script.py can be created locally. The path you provide is relative to where the code is running.

    I.e. your_xgboost_abalone_script.py can be located in the same directory where you are running the SageMaker SDK ("source code").

    For example if you have your_xgboost_abalone_script.py in the same directory as the source code:

    .
    ├── source_code.py
    └── your_xgboost_abalone_script.py
    

    Then you can point to this file exactly how the documentation depicts:

    estimator = XGBoost(entry_point = "your_xgboost_abalone_script.py", 
    .
    .
    .
    )
    

    The SDK will take your_xgboost_abalone_script.py repackage it into a model tar ball and upload it to S3 on your behalf.