Search code examples
amazon-s3amazon-sagemaker-studio

AWS SageMaker Notebook's Default S3 Bucket - Cant Access Uploaded Files within Notebook


In SageMaker Studio, I created directories and uploaded files to my SageMaker's default S3 bucket using the GUI, and was exploring how to work with those uploaded files using a SageMaker Studio Notebook.

Within the SageMaker Studio Notebook, I ran

sess = sagemaker.Session()
bucket = sess.default_bucket() #sagemaker-abcdef
prefix = "folderJustBelowRoot"

conn = boto3.client('s3')
conn.list_objects(Bucket=bucket, Prefix=prefix) 
# this returns a response dictionary with the corresponding metadata, which includes 'HTTPStatusCode': 200, 'server': 'AmazonS3' => which means the request-response was successful

What I dont understand is why the 'Contents' key and its value are missing from the 'conn.list_objects' dictionary response?

And when I go to 'my SageMaker's default bucket' in the S3 console, I am wondering why my uploaded files are not appearing.

===============================================================

I was expecting

  • the response from conn.list_objects(Bucket=bucket, Prefix=prefix) to contain the 'Contents' key (within my SageMaker Studio Notebook)

  • the S3 console to show the files I uploaded to 'my SageMaker's default bucket'


Solution

  • You can upload and download files from Amazon SageMaker to Amazon S3 using SageMaker Python SDK. SageMaker S3 utilities provides S3Uploader and S3Downloader classes to easily work with S3 from within SageMaker studio notebooks.

    A comment about the 'file system' in your question 2, the files are stored onto SageMaker Studio user profile Amazon Elastic File System (Amazon EFS) volume, and not EBS(SageMaker classic notebooks uses EBS volumes). Refer this blog for more detailed overview of SageMaker architecture