Assume 200,000 images in a flat Amazon S3 bucket.
The bucket looks something like this:
000000-1.jpg
000000-2.jpg
000000-3.jpg
000000-4.jpg
000001-1.jpg
000001-2.jpg
000002-1.jpg
...
ZZZZZZ-9.jpg
ZZZZZZ-10.jpg
(a 6 digit hash followed by a count, followed by the extension)
If I need all files matching 000001-*.jpg
, what's the most efficient way to get that?
In PHP I'd use rglob($path,'{000001-*.jpg}',GLOB_BRACE)
to get an array of matches, but I don't think that works remotely.
I can get a list of all files in the bucket, then find matches in the array, but that seems like an expensive request.
What do you recommend?
Amazon provides a way to do this directly using the S3 api.
You can use the prefix
option when calling listing S3 objects to only return objects that begin with the prefix. eg using the AWS SDK for PHP:
// Instantiate the class
$s3 = new AmazonS3();
$response = $s3->list_objects('my-bucket', array(
'prefix' => '000001-'
));
// Success?
var_dump($response->isOK());
var_dump(count($response->body->Contents))
You might also find the delimiter
option useful - you could use that to get a list of all the unique 6 digit hashes.