Search code examples
amazon-sagemakeramazon-sagemaker-studio

how to configure default bucket in sagemaker pipeline?


I am doing some experimentation in training models in amazon sagemaker/studio via sagemaker piplines. based on samples/aws docs , in my notebook ( sample code below) , in my jupyter notebook, i set up sagemaker session , execution role via code like shown below . also, there is a default bucket that is set up => sess.default_bucket(),

how is this default bucket configured? or can we simply point to the one we want?



import sagemaker
import os
try:
    sess = sagemaker.Session()
    role = sagemaker.get_execution_role()
except ValueError:
    import boto3
    # creates a boto3 session using the local profile we defined
    if local_profile_name:
        os.environ['AWS_PROFILE'] = local_profile_name # setting env var bc local-mode cannot use boto3 session
        #bt3 = boto3.session.Session(profile_name=local_profile_name)
        #iam = bt3.client('iam')
        # create sagemaker session with boto3 session
        #sess = sagemaker.Session(boto_session=bt3)
    iam = boto3.client('iam')
    sess = sagemaker.Session()
    # get role arn
    role = iam.get_role(RoleName=role_name)['Role']['Arn']


   print(sess.default_bucket()) # s3 bucketname

Solution

  • Within the pipeline context, PipelineSession should be used wherever possible.

    This class inherits the SageMaker session, it provides convenient methods for manipulating entities and resources that Amazon SageMaker uses, such as training jobs, endpoints, and input datasets in S3. When composing SageMaker Model-Building Pipeline, PipelineSession is recommended over regular SageMaker Session

    At this point, you can specify the bucket directly as a parameter:

    from sagemaker.workflow.pipeline_context import PipelineSession
    
    pipeline_session = PipelineSession(default_bucket = your_default_bucket)