Python/boto3 here. I have an S3 bucket with several "subfolders", and in one of these "subfolders" (yes I know there's no such thing in S3, but when you look at the layout below you'll understand) I have several dozen files, 0+ of which will be Excel (XLSX) files. Here's what my bucket looks like:
my_bucket/
Fizz/
Buzz/
Foo/
file1.jpg
file2.jpg
file3.txt
file4.xlsx
file5.pdf
file6.xlsx
file7.png
...etc.
So for, say, file4.xlsx
, the bucket is my_bucket
and the key is Foo/file4.xlsx
(if I understand S3 properly). For file7.png
, the bucket is still my_bucket
and its key is Foo/file7.png
, etc.
I need to look under this Foo/
"subfolder" for any file that ends with a .xlsx
extension, and if one exists, do a S3 GetObject on that Excel file. It's fine if no Excels exist, and its fine if multiple Excels exist. I just need to do a GetObject on the first one I find, if one is even there at all.
I understand that a typical boto3 invocation for getting an S3 object looks like:
s3 = Res.client("s3")
obj = s3.get_object(Bucket="my_bucket", Key="Foo/file2.jpg")
But I'm not sure how to list all the my_bucket/Foo/*
contents, filter by the first *.xlsx
and do the get_object(...)
on that specific file. Can anyone help nudge me in the right direction?
I don't believe this is possible with S3.
AWS does not support S3 objects filtering by suffix.
But you can do this with two steps.
s3_client = boto3.client('s3')
bucket = 'my-bucket'
prefix = 'my-prefix/foo/bar'
paginator = s3_client.get_paginator('list_objects_v2')
response_iterator = paginator.paginate(Bucket=bucket, Prefix=prefix)
file_names = []
for response in response_iterator:
for object_data in response['Contents']:
key = object_data['Key']
if key.endswith('.xlsx'):
file_names.append(key)
if file_names:
response = s3_client.get_object(Bucket=bucket, Key=file_names[0])