Search code examples
pythonpaginationamazon-dynamodbboto3filterexpression

Paginating data from DynamoDB when FilterExpression and Limit is involved


I need to paginate the data from a DynamoDB table. I have an application with Next and Previous buttons. Users should be able to navigate to the next and previous pages by clicking the respective buttons.

This is the code I've written for the same:


def paginate_dynamodb_items(table_obj, index, key, page_limit=10, **kwargs):
    """
    function to paginate dynamodb items
    """
    kwargs['scan_index_forward'] = True \
        if kwargs['pagination_direction'] == 'reverse' else False
    paginated_response = {
        kwargs['item_type']: [],
        "next_token": None,
        "transition_token": None
    }
    query_response = query_items(
        table_obj,
        index,
        key,
        return_whole_item=True,
        limit=page_limit,
        **{**kwargs}
    )
    data_res = query_response['Items']
    last_evaluated_key = query_response.get('LastEvaluatedKey')
    while last_evaluated_key and len(data_res) <= page_limit:
        query_response = query_items(
            table_obj,
            index,
            key,
            return_whole_item=True,
            limit=page_limit,
            **{
                **kwargs,
                "last_evaluated_key": last_evaluated_key
            }
        )
        data_res.extend(query_response["Items"])
        last_evaluated_key = query_response.get('LastEvaluatedKey')
    if len(data_res) > page_limit:
        data_res = data_res[:page_limit]
        if not last_evaluated_key:
            last_evaluated_key = True
    items_in_page = data_res[:page_limit]
    if not items_in_page:
        return paginated_response
    # generate next page token
    next_page_token = encode_string(
        json.dumps(
            generate_last_evaluated_key(items_in_page[-1], index)
        )
    ) if last_evaluated_key else None
    # generate transition token
    transition_token = None if not kwargs.get('last_evaluated_key') else \
        encode_string(
            json.dumps(
                generate_last_evaluated_key(items_in_page[0], index)
            )
    )
    display_items = sorted(items_in_page, key=itemgetter('time'), reverse=True)
    paginated_response = {
        kwargs['item_type']: display_items,
        "next_token": next_page_token,
        "transition_token": transition_token
    }
    return paginated_response

In the above function, query_items(), generate_last_evaluated_key(), etc. are custom functions that serve their purpose.

The Previous button and Next buttons would be disabled for the starting and ending pages respectively as transition_token and next_token will be null in those scenarios. The next_token and transition_token are pagination tokens I generate from the above function.

By sending the transition_token as the ExclusiveStartKey along with ScanIndexForward as False, I would be able to fetch the previous page, and then sort it based on the time attribute so that the sorting order is consistent in all pages.

However, if a FilterExpression is involved with the Limit parameter, it results in several looped queries, thus significantly increasing the execution time.

How do I optimize this function for lower latency and efficient querying, so that I still retain the response structure - using next_token and transition_token, but FilterExpression can also be incorporated?

Thanks in advance!


Solution

  • I have implemented a design inspired from the Exponential Backoff Algorithm to iteratively increase the Limit (exponentially, although capping at a threshold value) while querying for the data using FilterExpression and Limit, significantly decreasing the paginated data's overall loading time.