I am new to using aws-amplify and have a function similar to this which hits a query called listItems
and returns items where isEnbled
is true (from a DynamoDB).
I want this to filter the entire table which may be huge. I am therefore unable to simply set a limit like 1000 and leave it at that. Is there a way to specify limitless query and scan everything in the table? Or is there a different property I should be using instead?
import { API } from 'aws-amplify'
export async function getAllEnabledListItems() {
const { data } = await API.graphql({
query: queries.listItems,
variables: { filter: { isEnabled: { eq: true } }, limit: 10000 },
authMode: 'AMAZON_COGNITO_USER_POOLS'
})
return data
}
Rather than scanning every item and then filtering, you should consider adding a GSI to the "enabled" items in the table and then querying that. This will be much more efficient (i.e. faster and cheaper) at querying, at the expense of slightly higher write and storage costs. Usually it's a good trade off.
Regardless of whether you query or scan though, you're going to have to deal with DynamoDB pagination once the size of the result set grows too large (max 1MB). If the result set hits the threshold then you'll get that first page of results and a LastEvaluatedKey
. You'll then need to query again, passing the LastEvaluatedKey
value as ExclusiveStartKey
. You keep doing this until you get no LastEvaluatedKey
back.
If you update your AppSync schema and resolver to pass this LastEvaluatedKey
back as a paginationToken
(or whatever you want to call it), then you can requery repeatedly from your app passing the latest token to get the next page of results. If you don't need all the results at once you might consider lazily calling these to only ask for another page or results once you need it.
There are some other approaches.
If you know the filtered set of results will always be <1MB, one approach would be to swap out your DynamoDB datasource for a Lambda, and progressively scan and filter (or query) DynamoDB pages in a loop inside your lambda before returning the filtered results to your AppSync resolver, and from there return to your app.
The problems include:
Alternatively, if you can segment your items (or your "isEnabled" items) into multiple groups you can fan out your scan (or query) to implement parallel scans (or queries) before accumulating results as before. This may enable faster scans, but you'll still be limited in time and payload size so it's still problematic for huge table scans.
LastEvaluatedKey
/ ExclusiveStartKey
) in and out.Adding a GSI, querying it (rather than scanning), and then adding pagination to your AppSync schema and app is the most robust solution.