Search code examples
amazon-dynamodbdynamodb-queries

Read most recent records in DynamoDB


I have a table in DynamoDB with the following structure:

  • UserID (String - Partition key)
  • AsOfTimestamp (number - sort Key)
  • Product (string) ... and some other attributes.

I also have an LSI defined like this:

  • UserID (String - Partition Key)
  • Product (String - Sort Key)
  • AsOfTimestamp (Number) ... and other attributes

My problem is, I need to read the most recent records for all Products for a given UserID. For example, a return sample from this query could be:

UserID Product AsOfDate
User1 Product1 1617816274
User1 Product2 1617816288

My issue is, if I use the index on the table I can get the latest records but that does not guarantee that I will get records for each product, I could get for example a lot of records User1-Product1 before I see the most recent User1-Product2.

If I use the LSI, I can get records for a given UserId sorted by product but I will be forced to sort the results by AsOfDate and read the entire result set to get the most recent ones.

Thanks!


Solution

  • If I understand what you're asking for I think you're going to need another record for that data. What you want is a single AsOfDate for each user/product combination, where the AsOfDate is the most recent. If you add a record that is keyed on UserID and Product then you'll have exactly one record for each. To make that work you'll likely need to change your table structure to support the single table design pattern (or store this data in a different table). In a single table design you might have something like this:

    pk sk UserID Product AsOfDate
    User1 Purchase|Product1|1617816274 User1 Product1 1617816274
    User1 MostRecent|Product1 User1 Product1 1617816274
    User1 Purchase|Product2|1617816274 User1 Product2 1617816274
    User1 Purchase|Product2|1617816288 User1 Product2 1617816288
    User1 MostRecent|Product2 User1 Product2 1617816288

    Then, to get all the most recent records you query for pk = userId and begins_with(sk, 'MostRecent|'). To get the other records you query for pk = userId and begins_with(sk, 'Purchase|'). Your access pattern requirements might have you changing that some, but the idea should be similar to this.

    Whenever you do a new "Purchase" you would insert the new row for that, and update the "MostRecent" row as well (you can do that in a transaction if you need to, or use a DynamoDB stream to do that update).