This article (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-indexes-gsi-sharding.html) talks about a technique for sharding global secondary index values across multiple partitions, by introducing a random integer as the partition key.
That makes sense to me, but the article does not clearly explain how to then query that index. Let's say I'm using a random integer from 1-10 as the partition key, and a number as the sort key, and I want to fetch the 3 records with the highest sort key value (from all partitions).
Would I need to do 10 separate queries, sorting each one, with a limit of 3 items, then do an in-memory sort of the resulting 30 items and pick the first 3? That seems needlessly complicated, and not very efficient for the client.
Is there some way to do a single DynamoDB operation that queries all 10 partitions, does the sorting, and just returns the 3 records with the highest vavlue?
Would I need to do 10 separate queries
Yes. This is called a scatter read in the Dynamo docs...
Normally the client would do so with multiple threads...so while it adds complexity, efficiency is usually good.
Why the limit 3? That requirement seems to be the bigger cause of inefficiency.
Is there some way to do a single DynamoDB operation that queries all 10 partitions, does the sorting, and just returns the 3 records with the highest vavlue?
The only way to query all partitions is with a full table Scan
. But that doesn't provide sorting & ordering. You'd still need to do it in your app. The scan would be a lot less efficient than the scatter read.
If this is a "Top 3 sellers" type list...I believe the recommended practice to to (periodically) calculate & store the results. Rather than having to constantly derive the results. Take a look here: Using Global Secondary Indexes for Materialized Aggregation Queries