Search code examples
amazon-dynamodbdynamodb-queries

Query DynamoDB based on whether attribute is member of a group


This is a simplified version of my problem using a DynamoDB Table. Most items in the Table represent sales across multiple countries. One of my required access patterns is to retrieve all sales in countries which belong to a certain country_grouping between a range of order_dates. The incoming stream of sales data contains the country attribute, but not the country_grouping attribute.

Another entity in the same Table is a reference table, which is infrequently updated, which could be used to identify the country_grouping for each country. Can I design a GSI or otherwise structure the table to retrieve all sales for a given country_grouping between a range of order dates?

Here's an example of the Table structure:

PK SK sale_id order_date country country_grouping
SALE#ID#1 ORDER_DATE#2022-06-01 1 2022-06-01 UK
SALE#ID#2 ORDER_DATE#2022-09-01 2 2022-09-01 France
SALE#ID#3 ORDER_DATE#2022-07-01 3 2022-07-01 Switzerland
COUNTRY_GROUPING#EU COUNTRY#France France EU
COUNTRY_GROUPING#NATO COUNTRY#UK UK NATO
COUNTRY_GROUPING#NATO COUNTRY#France France NATO

Possible solution 1

As the sales items are streamed into the Table, query the country_grouping associated with the country in the sale, and write the corresponding country_grouping to each sale record. Then create a GSI where country_grouping is the partition key and the order_date is the sort key. This seems expensive to me, consuming 1 RCU and 1 WCU per sale record imported. If country groupings changed (imagine the UK rejoins the EU), then I would run an update operation against all sales in the UK.

Possible solution 2

Have the application first query to retrieve every country in the desired country_grouping, then send an individual request for each country using a GSI where the partition key is country and the order_date is the sort key. Again, this seems less than ideal, as I consume 1 WCU per country, plus the 1 WCU to obtain the list of countries.

Is there a better way?


Solution

  • Picking an optimal solution depends on factors you haven't mentioned:

    • How quickly you need it to execute
    • How many sales records per country and country group you insert
    • How many sales records per country you expect there to be in the db at query time
    • How large a Sale item is

    For example, if your Sale items are large and/or you insert a lot every second, you're going to need to worry about creating a hot key in the GSI. I'm going to assume your update rate is not too high, the Sale item size isn't too large, and you're going to have thousands or more Sale items per country.

    If my assumptions are correct, then I'd go with Solution 2. You'll spend one read unit (it's not a WCU but rather an RCU, and it's only half a read unit if eventually consistent) to Query the country group and get a list of countries. Do one Query for each country in that group to pull all the Sale items matching the specific time range for that country. Since there are lots of matching sales, the cost is about the same. One 400 KB pull from a country_grouping PK is the same cost as 4 100 KB pulls from four different country PKs. You can also do the country Query calls in parallel, if you want, to speed execution. If you're returning megabytes of data or more, this will be helpful.

    If in fact you have only a few sales per country, well, then any design will work.