Search code examples
pythonamazon-web-servicespaginationamazon-dynamodbboto3

How to use StartingToken with DynamoDB pagination scan


I have a DynamoDB table and I want to output items from it to a client using pagination. I thought I'd use DynamoDB.Paginator.Scan and supply StartingToken, however I dont see NextToken in the output of either page or iterator itself. So how do I get it?

My goal is a REST API where client requests next X items from a table, supplying StartingToken to iterate. Originally there's no token, but with each response server returns NextToken which client supplies as a StartingToken to get the next X items.

import boto3
import json
table="TableName"
client = boto3.client("dynamodb")
paginator = client.get_paginator("query")
token = None
size=1

for i in range(1,10):
    client.put_item(TableName=table, Item={"PK":{"S":str(i)},"SK":{"S":str(i)}})

it = paginator.paginate(
    TableName=table,
    ProjectionExpression="PK,SK",
    PaginationConfig={"MaxItems": 100, "PageSize": size, "StartingToken": token}
)

for page in it:
    print(json.dumps(page, indent=2))
    break

As a side note - how do I get one page from paginator without using break/for? I tried using next(it) but it does not work.

Here's it object:

{
'_input_token': ['ExclusiveStartKey'],
 '_limit_key': 'Limit',
 '_max_items': 100,
 '_method': <bound method ClientCreator._create_api_method.<locals>._api_call of <botocore.client.DynamoDB object at 0x000001CBA5806AA0>>,
 '_more_results': None,
 '_non_aggregate_key_exprs': [{'type': 'field', 'children': [], 'value': 'ConsumedCapacity'}],
 '_non_aggregate_part': {'ConsumedCapacity': None},
 '_op_kwargs': {'Limit': 1,
                'ProjectionExpression': 'PK,SK',
                'TableName': 'TableName'},
 '_output_token': [{'type': 'field', 'children': [], 'value': 'LastEvaluatedKey'}],
 '_page_size': 1,
 '_result_keys': [{'type': 'field', 'children': [], 'value': 'Items'},
                  {'type': 'field', 'children': [], 'value': 'Count'},
                  {'type': 'field', 'children': [], 'value': 'ScannedCount'}],
 '_resume_token': None,
 '_starting_token': None,
 '_token_decoder': <botocore.paginate.TokenDecoder object at 0x000001CBA5D81960>,
 '_token_encoder': <botocore.paginate.TokenEncoder object at 0x000001CBA5D82290>
}

And the page:

{
  "Items": [
    {
      "PK": {
        "S": "2"
      },
      "SK": {
        "S": "2"
      }
    }
  ],
  "Count": 1,
  "ScannedCount": 1,
  "LastEvaluatedKey": {
    "PK": {
      "S": "2"
    },
    "SK": {
      "S": "2"
    }
  },
  "ResponseMetadata": {
    "RequestId": "DBE4ON8SI0GOTS2RRO2OG43QJVVV4KQNSO5AEMVJF66Q9ASUAAJG",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "server": "Server",
      "date": "Fri, 30 Dec 2022 11:37:52 GMT",
      "content-type": "application/x-amz-json-1.0",
      "content-length": "121",
      "connection": "keep-alive",
      "x-amzn-requestid": "DBE4ON8SI0GOTS2RRO2OG43QJVVV4KQNSO5AEMVJF66Q9ASUAAJG",
      "x-amz-crc32": "973385738"
    },
    "RetryAttempts": 0
  }
}

I thought I could use LastEvaluatedKey but that throws an error, also tried to get token like this, but it did not work:

it._token_encoder.encode(page["LastEvaluatedKey"])

I also thought about using just scan without iterator, but I'm actually outputting a very filtered result-set. I need to set Limit to a very large value to get results and I don't want too many results at the same time. Is there a way to scan up to 1000 items but stop as soon as 10 items are found?


Solution

  • I would suggest not using paginator but rather just use the lower level Query. The reason being is the confusion between NextToken and LastEvaluatedKey. These are not interchangeable.

    • LastEvaluatedKey is passed to ExclusiveStartKey
    • NextToken is passed to StartToken

    It's preferrable to use the Resource Client which I believe causes no confusing on how to paginate

    import boto3
    
    dynamodb = boto3.resource('dynamodb', region_name=region)
    
    table = dynamodb.Table('my-table')
    
    response = table.query()
    data = response['Items']
    
    # LastEvaluatedKey indicates that there are more results
    while 'LastEvaluatedKey' in response:
        response = table.query(ExclusiveStartKey=response['LastEvaluatedKey'])
        data.update(response['Items'])