Search code examples
javaamazon-dynamodbdynamodb-queries

What is the difference between PaginatedQueryList and QueryResultPage in DynamoDB?


I am currently working with DynamoDB with Java and using DynamoDBMapper. I saw that when we use DynamoDBQueryExpression we can use either PaginatedQueryList or QueryResultPage. If we are using any of them below are the methods we have to use,

query method - returns a [PaginatedQueryList][1]
queryPage method - returns a [QueryResultPage][1]

PaginatedQueryList says it will first load 1MB of data and if we iterate over then it will load next page if needed and also this is paginated. But what about QueryResultPage? It says it is loading 1MB of data. But what about if we iterate it? Will it load the second page or just only give us 1MB of data? I couldn't find anything about that? And also QueryResultPage gives us the LastEvaluatedKey but PaginatedQueryList not. So is there a way to get the LastEvaluatedKey in PaginatedQueryList or else if we need to get that key do we have to always use the QueryResultPage?

And also instead of the following code,

PaginatedQueryList<Data> data = dynamoDBMapper.query(Data.class, queryExpression);

If we use the following,

List<Data> data = dynamoDBMapper.query(Data.class, queryExpression);
data.size();

Will it load the all data found in DB? What if I use stream() instead of data.size() will it load all?


Solution

  • TL;DR:

    • QueryResultPage won't load any additional data lazily, only PaginatedQueryList does that.
    • PaginatedQueryList abstracts away the pagination for you, that's why it doesn't expose the key. If you need the key, you'll need to use queryPage with QueryResultPage. In case you need more than just the first page, you'll have to request the other pages on your own.
    • .size() will load all data into memory, as you need to have the data, to count them.
    • .stream() might load all data into memory, depending on what you do with the stream. If you say .stream().limit(1) then no more pages will be loaded. But if you say .stream().count() then all pages will be loaded.

    Looking at the documentation of those 2, PaginatedQueryList seems to be an object, which tries to abstract away from you the fact, that underneath the results are paginated.

    From https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/dynamodbv2/datamodeling/PaginatedQueryList.html (emphasis mine):

    [...] Paginated results are loaded on demand when the user executes an operation that requires them. Some operations, such as size(), must fetch the entire list, but results are lazily fetched page by page when possible.

    So, if you basically don't want to deal with paginations, use the query method. But keep in mind, that you application will ultimately still need to page through the results, if you want to return all of them (or know their size)

    On the other hand, QueryResultPage is closer to the DynamoDB API. You are dealing with a page, and you can use getLastEvaluatedKey() to get the parameter to be used for the next setExclusiveStartKey (on your DynamoDBQueryExpression)

    In summary:

    • query is more user friendly, as it hides the explicit pagination, but if your result contains many pages, your code might get slower without you noticing it at first, because those pages are loaded lazily.
    • queryPage is more explicit about the intent. You must load every page manually and therefore think if you really need all the data, or if you don't want to further offload pagination to your client, for instance.

    You can read a similar description when reading the official documentation of queryPage at https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBMapper.Methods.html#DynamoDBMapper.Methods.queryPage (emphasis mine):

    Queries a table or secondary index and returns a single page of matching results. As with the query method, you must specify a partition key value and a query filter that is applied on the sort key attribute. However, queryPage returns only the first "page" of data, that is, the amount of data that fits in 1 MB