Search code examples
amazon-dynamodbdynamodb-queries

How to paginate on the nested array in dynamodb


This is how my item schema looks in my dynamoDB table with table name "Topic".

{
        "topicId": "string",
        "topic": "string",
        "updatedTimeUTC": "number",
        "articles": [
            {
                "articleId": "string",
                "headline": "string",
                "updatedTimeUTC": "number"
            },
            {
                "articleId": "string",
                "headline": "string",
                "updatedTimeUTC": "number"
            }
        ],
        "articleCount": "number"
    }

As you can see each item represents a topic and each topic will have list of articles associated with it.

My access patterns looks like these:

  1. Need to fetch topics with sorting order of updatedTimeUTC by pagination with page-size 5.
  2. On fetch of single topic item using topicId, I should be able to paginate the articles array.

I can able to achieve the first point but the second one I am unable to. Till now, I am storing all the items in a single table. Do I need to change the schema and split up the topics and articles in multiple tables to achieve both access patterns.

I need pagination at topic level and also at articles level. Please suggest.


Solution

  • With your existing table design, the only article pagination you can do is client side i.e. after you fetch a particular topic, you write some client-side code logic to skip x articles and get the slice you need.

    Why is this the only option? By bundling all of a topic's articles inside the topic item, it's all or nothing, you can either get all the topic's articles or none. As of today, no DynamoDB API(Query, GetItem, etc.) will retrieve just a slice of the articles list attribute, they'll either return the entire attribute or not return it at all.

    Client side pagination of course is a very wasteful approach as you'll have to fetch the entire article list and this might include lots of articles you're not interested in. A Dynamo item has a reasonable upper limit - 400kb - so the size of the data might not be a big deal.

    But do you have a restriction in place for the number of articles that can be on a topic? If no, then you have to think about that. A single Dynamo item can't exceed 400kb in size so the list of articles in a topic item mustn't grow in such a way that the topic item exceeds 400kb.

    If your application is such that there shouldn't be a restriction to the number of articles on a topic then it would help if you considered re-modeling your table.

    This is a suggested table structure that can achieve your desired access patterns in a more scalable way:

    enter image description here

    In the suggested table structure:

    • Topics and articles are saved as separate items on the table, articles that belong to the same topic will share the same partition key i.e. TOPIC<#TOPIC_ID>, however, their sort key will differ.

    With this approach, you can fetch and paginate articles in a topic with a query that looks like this:

    const getTopicArticles = async (topicId, lastArticleId = '', limit = 5) => {
      const params = {
        TableName: "<TableName>", // substitute with the right table name
        KeyConditionExpression: "PK = :pk AND #SK > :sk",
        ExpressionAttributeValues: {
          ":pk": `TOPIC#${topicId}`,
          ":sk": `ARTICLE#${lastArticleId}`,
        },
        Limit: limit,
      };
    
      // make DYNAMO query api call
    };
    

    PS: The table structure above might look weird if you're not familiar with the DynamoDB single table design, you can check-out what single table design is here