This is more of a concept clarification. I can find the actual counts using Boto3 via repeated queries using the LastEvaluatedKey of previous response.
I want to count items matching certain conditions in dynamoDb. I am using the "select = count", which according to the docs [1] should just return count of matched items, and my assumption that the response will not be paginated.
COUNT - Returns the number of matching items, rather than the matching items themselves.
When i try it via aws-cli, my assumptions seems correct, (like the rest api samples in the doc [1])
aws dynamodb query \
--table-name 'my-table' \
--index-name 'classification-date-index' \
--key-condition-expression 'classification = :col AND #dt BETWEEN :start AND :end' \
--expression-attribute-values '{":col" : {"S":"INTERNAL"}, ":start" : {"S": "2020-04-10"}, ":end" : {"S": "2020-04-25"}}' \
--expression-attribute-names '{"#dt" : "date"}' \
--select 'COUNT'
"Count": 18817,
"ScannedCount": 18817,
"ConsumedCapacity": null
But when I try using Python3 and Boto3, the response is paginated, and I have to repeat the query till LastEvaluatedKey is empty.
In [22]: table.query(IndexName='classification-date-index', Select='COUNT', KeyConditionExpression= Key('classification').eq('INTERNAL') & Key('date').between('2020-04-10', '2020-04-25'))
{'Count': 5667,
'ScannedCount': 5667,
'LastEvaluatedKey': {'classification': 'INTERNAL',
'date': '2020-04-14',
's3Path': '<redacted>'},
'ResponseMetadata': {'RequestId': 'TH3ILO0P47QB7GAU9M3M98BKJVVV4KQNSO5AEMVJF66Q9ASUAAJG',
'HTTPStatusCode': 200,
'HTTPHeaders': {'server': 'Server',
'date': 'Sat, 25 Apr 2020 13:32:36 GMT',
'content-type': 'application/x-amz-json-1.0',
'content-length': '230',
'connection': 'keep-alive',
'x-amz-crc32': '133035383'},
'RetryAttempts': 0}}
I expected the same behaviour from the Boto3 sdk like the aws cli, as the response seems lesser than the 1mb. The docs are slightly conflicting ...
"Paginating Table Query Results" [2] page says :
DynamoDB paginates the results from Query operations. With pagination, the Query results are divided into "pages" of data that are 1 MB in size (or less). An application can process the first page of results, then the second page, and so on. A single Query only returns a result set that fits within the 1 MB size limit.
While the "Query" [1] page says:
A single Query operation will read up to the maximum number of items set (if using the Limit parameter) or a maximum of 1 MB of data and then apply any filtering to the results using FilterExpression.
Just ran down this issue myself. The AWS CLI does automatic summation of the pages from the DynamoDB query. To stop it from doing this, add --no-paginate
onto your command as listed on this page