Search code examples
javaamazon-dynamodbaws-sdkaws-java-sdk

DynamoDB: Getting only most recent items of all unique hash keys


Given a DynamoDB table with a partition key id and sort key date_epoch.

I'll have items like this:

id  |  date_epoch
-----------------
1   |  1535961978
2   |  1535961996
1   |  1535962033
2   |  1535962055
3   |  1535962064
5   |  1535962073
1   |  1535962080
2   |  1535962085

For each given unique id, I only want its most recent item. So from this sample data, I only want the following results:

id  |  date_epoch
-----------------
3   |  1535962064
5   |  1535962073
1   |  1535962080
2   |  1535962085

I can figure out how to do this with very ugly code. I've gotten each unique id, then iterated over each individual id and gotten only the most recent item .withScanIndexForward(false) and .withMaxResultSize(1) (as shown in this example and this example), but it seems like there must be a better way to do this.

Can we set a scan filter to limit the max items or something else I haven't thought of?


Solution

  • This is more of a comment than an answer, but here it goes: No - you can’t get the answer you are looking for with a scan. There is no way to craft a filter and even if there were, you’d still be paying for the full price of a scan (though you’d be saving on network bandwidth).

    Your options are:

    1. Use the technique you are using: get the unique Ids and then iterate and query with limit 1

    2. Use two tables: one to hold the historical values and one to hold the most recent value for each item

    Note that in the second example there are some caveats: you have to be tolerant of eventual consistency; and you must not update any of the items more than 1000 times per second (though the practical limit is realistically lower - maybe 6-700)