Search code examples
amazon-web-servicesamazon-dynamodbdynamodb-queriesamazon-dynamodb-indexamazon-dynamodb-data-modeling

Is there a better way to model this access pattern than to use two global secondary indexes (GSI)?


I'm trying to figure out the data model and access patterns for an app keeping track of animal movements between different fields (pastures). There are movement records that look like this:

PK                 FROM            TO          DATE
------------------------------------------------------
ANIMAL#001       FIELD#A       FIELD#B       January 3
ANIMAL#001       FIELD#Q       FIELD#R       September 19
ANIMAL#002       FIELD#A       FIELD#B       January 3
ANIMAL#003       FIELD#C       FIELD#D       March 15
ANIMAL#005       FIELD#F       FIELD#A       April 22

For a specific field, e.g. FIELD#A, I'd like to know all the movements into and out of that field, the date of the movement, and the number of animals. The results should look like:

DATE        FROM        TO          NUMBER_ANIMALS
--------------------------------------------------
January 3   FIELD#A     FIELD#B         2
April 22    FIELD#F     FIELD#A         1

Possible solutions and attempts:

  1. A GSI with PK=FROM, SK=TO. If I query the GSI with PK=FIELD#A, this only gives one half of the picture, that is, movements from FIELD#A. I can't obtain movements to FIELD#A.

  2. A composite attribute like FIELD#A#FIELD#B used as the PK in a GSI. Runs into the same problem as attempt 1.

  3. Two GSI. GSI1 has PK=FROM and GSI2 has PK=TO. I can query GSI1 with PK=FIELD#A and do some post-processing (groupby, count) to get part of the result. I can then query GSI2 with PK=FIELD#A and post-process, getting the rest of the result. This looks like it will work but requires two GSI and two queries. I can't overload one GSI since both columns in use are from the same item.

  4. Some combination of scanning the entire table and filtering the results which I'd rather avoid since there might be 50,000+ items in the entire table.

I can see how to do it with two GSIs, but what's the most efficient way?


Solution

  • I could imagine a slightly different table structure (ANIMALID being the partition key and FIELDID being the sort key):

    ANIMALID | FIELDID | FROM_TO | ...
    ——————————————————————————————————
    ANIMAL#1 | FIELD#A | FROM    | ...
    ANIMAL#1 | FIELD#B | TO      | ...
    ANIMAL#2 | FIELD#C | FROM    | ...
    ANIMAL#2 | FIELD#A | TO      | ...
    

    And a GSI with the following structure:

    FIELDID | ANIMALID | ...
    

    Then you can query the GSI just by FIELDID and aggregate the results.