Search code examples
phpamazon-s3jmespath

How can I filter S3 objects by size using AWS SDK for PHP v3 Aws/ResultPaginator->Search and JMESPath expression?


I need some assistance with filtering S3 results using AWS SDK for PHP v3 and JMESPath. Filtering by a number is not working with the PHP SDK as JMESPath documention and online examples suggest.

<?php
// test.php

use Aws\S3\S3Client;

// Create S3 client
$s3 = new S3Client([
    'version' => 'latest',
    'region'  => 'us-east-1'
]);

$bucket = 'my-bucket-name';
$prefix = 'path/to/my/objects';

// Call list-objects-v2
$awspaginator = $s3->getPaginator('ListObjectsV2', [
    'Bucket' => $bucket,
    'Prefix' => $prefix
]);

// Apply filter to paginator
$jmes = "reverse(Contents[?Size>`0`].{Key: Key, Date: LastModified, Size: Size}) | [-10:]";
$results = $awspaginator->search($jmes);

// Echo results
$i = 0;
foreach ($results as $result) {
    echo "\nResult: " . print_r($result);
    $i++;
}
echo "\nCount: " . $i . PHP_EOL;
?>

This outputs Count: 0

But if I replace Size> `0` with StorageClass=='STANDARD' I get the 10 most recent objects as expected.

I've attempted the following Size expressions without any luck.

  • Size>0 // returns error: unexpected number token
  • Size>'0' // succeeds: returns no results
  • Size>`0` // succeeds: returns no results
  • Size!=`0` // returns results but does not filter out zero size objects
  • Size!=\"0\" // returns results but does not filter out zero size objects

Note that the s2api query works just fine so this seems to be something to do with the PHP SDK Search method.

--bucket my-bucket-name \
--prefix path/to/my/objects \
--query "reverse(Contents[?Size>\`0\`].{Key: Key, Date: LastModified, Size: Size}) | [-10:]"

Any help is appreciated!


Solution

  • I'm struggling to find this documented anywhere, but it appears that Size is unmarshalled as a string. I was able to make your example work with [?to_number(Size)>`0`] or indeed with [?Size!='0'].

    This appears to be a bug, or at least a failure in documentation, as the docs state:

    The AWS CLI supports JMESPath. Expressions you write for CLI output are 100 percent compatible with expressions written for the AWS SDK for PHP.

    https://docs.aws.amazon.com/sdk-for-php/v3/developer-guide/guide_jmespath.html

    The only thing I have been able to find even alluding to this behaviour is only tenuously related:

    https://forums.aws.amazon.com/message.jspa?messageID=752541#jive-message-312324

    Here the problem is that the DynamoDB API is expecting to receive numbers as strings, and an Amazon representative notes that this behaviour is a) because the SDK has to support 32-bit environments that can't handle integers over 2 billion, and b) in general all AWS SDKs are generated automatically from a language-agnostic set of data files, and they prefer to avoid making exceptions when they can avoid it. This seems to imply that using strings as integers may occur broadly across the SDK. That said, I can't find any mention in elsewhere.


    Whether or not it's deliberate, it appears to be because the PHP SDK's Api/Parser/XmlParser doesn't have a mapping for the long type that Size is declared as. It falls back to the default behaviour here of parsing it as a string:

    https://github.com/aws/aws-sdk-php/blob/master/src/Api/Parser/XmlParser.php#L23-L31