javascript graphql amazon-dynamodb aws-amplify

How to use graphQL limit in aws amplify

I am new to using aws-amplify and have a function similar to this which hits a query called listItems and returns items where isEnbled is true (from a DynamoDB). I want this to filter the entire table which may be huge. I am therefore unable to simply set a limit like 1000 and leave it at that. Is there a way to specify limitless query and scan everything in the table? Or is there a different property I should be using instead?

 import { API } from 'aws-amplify'

 export async function getAllEnabledListItems() {
      const { data } = await API.graphql({
      query: queries.listItems,
      variables: { filter: { isEnabled: { eq: true } }, limit: 10000 },
      authMode: 'AMAZON_COGNITO_USER_POOLS' 
    })
 return data
 }

Solution

DynamoDB Scan vs Query

Rather than scanning every item and then filtering, you should consider adding a GSI to the "enabled" items in the table and then querying that. This will be much more efficient (i.e. faster and cheaper) at querying, at the expense of slightly higher write and storage costs. Usually it's a good trade off.

Pagination

Regardless of whether you query or scan though, you're going to have to deal with DynamoDB pagination once the size of the result set grows too large (max 1MB). If the result set hits the threshold then you'll get that first page of results and a LastEvaluatedKey. You'll then need to query again, passing the LastEvaluatedKey value as ExclusiveStartKey. You keep doing this until you get no LastEvaluatedKey back.

If you update your AppSync schema and resolver to pass this LastEvaluatedKey back as a paginationToken (or whatever you want to call it), then you can requery repeatedly from your app passing the latest token to get the next page of results. If you don't need all the results at once you might consider lazily calling these to only ask for another page or results once you need it.

Other considerations

There are some other approaches.

If you know the filtered set of results will always be <1MB, one approach would be to swap out your DynamoDB datasource for a Lambda, and progressively scan and filter (or query) DynamoDB pages in a loop inside your lambda before returning the filtered results to your AppSync resolver, and from there return to your app.

The problems include:

How to guarantee that the filtered results set will always be under 1MB (AppSync's limit)
How to guarantee that the lambda will return in time (AppSync time limit)
You're scanning the whole table (if you scan rather than query) but you're only interested in a subset of those items (the "isEnabled" items)

Alternatively, if you can segment your items (or your "isEnabled" items) into multiple groups you can fan out your scan (or query) to implement parallel scans (or queries) before accumulating results as before. This may enable faster scans, but you'll still be limited in time and payload size so it's still problematic for huge table scans.

Summary

DynamoDB enforces paginating results (max 1MB)
AppSync limits payload size (max 1MB, less if you're going to use subscriptions too)
DynamoDB Scans are less efficient than queries. Consider adding a GSI so that you can query instead of Scan / Filter.
Hacks to accumulate the pages of results inside Lambda or AppSync VTLs are fragile, and probably won't work for huge tables
Implementing pagination in your app will require updates to your AppSync schema to pass DynamoDB "pagination tokens" (LastEvaluatedKey / ExclusiveStartKey) in and out.

Adding a GSI, querying it (rather than scanning), and then adding pagination to your AppSync schema and app is the most robust solution.