Search code examples
typescriptamazon-web-servicesamazon-dynamodb

How to return only one primary key entry based on sort key for dynamodb?


I have a table Users that has fields name (PK), version (SK) and region(GSI). I want to write an AppSync resolver that fetches all the latest versions of entities for a specific region. For example, if I have the next entries:

name version region
1 1 Europe
1 2 Europe
2 1 Europe
3 1 Africa

For Europe region the API should return only two entries:

name version region
1 2 Europe
2 1 Europe

I prepare DynamoDBQueryRequest this way:

{
    operation: 'Query',
    index: 'RegionIndex',
    query: {
        expression: 'region = :region ',
        expressionValues: util.dynamodb.toMapValues({ ':region': region }),
    },
    select: 'ALL_ATTRIBUTES'
};

But I get all versions for the entries inside of specified region. How to change the request to get only latest versions?


Solution

  • Some options for getting just the latest version

    Option 1: latest-version attribute

    Use a new column, latest-version, to keep track of which item is latest for each user.

    When you write a new user version

    When you write a new user version, the steps are:

    Fetch the latest version of the user

    If you don't already have it in the client app (the one running the queries), grab the latest version of the user. You are probably already doing this. But you can grab it by either (a) querying the base table with ScanIndexForward set to false and Limit set to 1, or (b) by including the latest-version that we're adding here in a FilterExpression. (a) seems better because it consumes less read capacity (with a filter it reads items before it applies the filter), and because it doesn't depend on latest-version always being right.

    Write the new version of the user

    Write the item as you are currently doing, but add an attribute with data type Boolean called latest-version and set it to true.

    Update the old latest version's latest-version attribute

    After you write the new version of the user, update the old item, the one that until recently was the latest version, setting its latest-version attribute to false.

    When you get the users by region

    When you query your GSI for the latest version of each user for a given region, in your KeyConditionExpression you'll specify that the latest-version attribute should be true.

    Handle the case in your client code where multiple versions of the user have latest-version set to true by looping over the items and finding the highest-numbered version for each user.

    Option 2: Deal with it on the client

    Query the GSI for the users in a region, sort in descending order by version, and, either in your resolver or in the browser, get the first item for each user as the latest version of that user.

    Common to both options: sort key on the GSI

    Either way, you'll need a sort key on the GSI, so that you can restrict which items get sent (Option 1) or loop through them efficiently to find the latest version for each user (Option 2). With Option 1 the GSI sort key is latest-version, with Option 2 it's version.

    Edit: With Option 2, having a sort key on the GSI would not be mandatory. You'll have all the items for the region on the client, and it shouldn't be much more expensive to visit all the items as opposed to one item per user. However, you'd need the GSI sort key, if you have one, to be a new attribute called user-version, user concatenated with version. That's because, if you're going to grab the first item for each user within the region as the latest, you want DynamoDB to sort the result set by user then version, not just by version.

    Pros and cons

    Option 1 could be better if you have a ton of updates to the user items, new versions of a given user getting written frequently. Option 2 is simpler but consumes more read capacity and transmits more data.

    Overall, which is better?

    I'd use Option 2: Deal with it on the client.

    Reasons:

    1. It's simpler.
    2. You're probably going to have to deal with paginated results already, since you're getting all the users in a region.
    3. I think the algorithm I described for maintaining latest-version is right, but maybe something would go wrong, for example if there are edge cases I'm not considering with failed updates or multiple clients updating the latest version simultaneously.
    4. It reflects a strategy of pushing processing onto the client that I've heard DynamoDB experts advocate, for example in #8 here and starting around 54:41 in this video ("Send all versions when queried...Process with client-side logic"). Fundamentally, DynamoDB offers less database-side computation options, but gives answers more efficiently, which favors client-side processing.