I am currently working with DynamoDB with Java and using DynamoDBMapper
. I saw that when we use DynamoDBQueryExpression
we can use either PaginatedQueryList
or QueryResultPage
. If we are using any of them below are the methods we have to use,
query method - returns a [PaginatedQueryList][1]
queryPage method - returns a [QueryResultPage][1]
PaginatedQueryList
says it will first load 1MB of data and if we iterate over then it will load next page if needed and also this is paginated. But what about QueryResultPage
? It says it is loading 1MB of data. But what about if we iterate it? Will it load the second page or just only give us 1MB of data? I couldn't find anything about that? And also QueryResultPage
gives us the LastEvaluatedKey
but PaginatedQueryList
not. So is there a way to get the LastEvaluatedKey
in PaginatedQueryList
or else if we need to get that key do we have to always use the QueryResultPage
?
And also instead of the following code,
PaginatedQueryList<Data> data = dynamoDBMapper.query(Data.class, queryExpression);
If we use the following,
List<Data> data = dynamoDBMapper.query(Data.class, queryExpression);
data.size();
Will it load the all data found in DB? What if I use stream()
instead of data.size()
will it load all?
TL;DR:
QueryResultPage
won't load any additional data lazily, only PaginatedQueryList
does that.PaginatedQueryList
abstracts away the pagination for you, that's why it doesn't expose the key. If you need the key, you'll need to use queryPage
with QueryResultPage
. In case you need more than just the first page, you'll have to request the other pages on your own..size()
will load all data into memory, as you need to have the data, to count them..stream()
might load all data into memory, depending on what you do with the stream. If you say .stream().limit(1)
then no more pages will be loaded. But if you say .stream().count()
then all pages will be loaded.Looking at the documentation of those 2, PaginatedQueryList
seems to be an object, which tries to abstract away from you the fact, that underneath the results are paginated.
From https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/dynamodbv2/datamodeling/PaginatedQueryList.html (emphasis mine):
[...] Paginated results are loaded on demand when the user executes an operation that requires them. Some operations, such as size(), must fetch the entire list, but results are lazily fetched page by page when possible.
So, if you basically don't want to deal with paginations, use the query
method. But keep in mind, that you application will ultimately still need to page through the results, if you want to return all of them (or know their size)
On the other hand, QueryResultPage
is closer to the DynamoDB API. You are dealing with a page, and you can use getLastEvaluatedKey()
to get the parameter to be used for the next setExclusiveStartKey
(on your DynamoDBQueryExpression
)
In summary:
query
is more user friendly, as it hides the explicit pagination, but if your result contains many pages, your code might get slower without you noticing it at first, because those pages are loaded lazily.queryPage
is more explicit about the intent. You must load every page manually and therefore think if you really need all the data, or if you don't want to further offload pagination to your client, for instance.You can read a similar description when reading the official documentation of queryPage
at https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBMapper.Methods.html#DynamoDBMapper.Methods.queryPage (emphasis mine):
Queries a table or secondary index and returns a single page of matching results. As with the query method, you must specify a partition key value and a query filter that is applied on the sort key attribute. However, queryPage returns only the first "page" of data, that is, the amount of data that fits in 1 MB