amazon-web-services amazon-dynamodb amazon-dynamodb-index

When it's worth the tradeoff of using local secondary index in DynamoDB?

I've read guidelines for secondary indexes but I'm not sure when the ability to search fast outweighs the disadvantage of scan over attributes. Let me give you an example.

I am saving game progress data for users. The PK is user ID. I need to be able to:

Find out user progress about a particular game.
Get all finished/in progress games for a user.

Thus, I can design my SK as progress_{state} to be able to query all games by progress fast (state represents started/finished) or I can design my SK as progress_{gameId} to be able to query progress of a given game fast. However, I can't have both using just SK. When I chose one, the other operation will require a scan.

Therefore, I was thinking about using LSI which will add an overhead to the whole table as noted by Amazon here:

Every secondary index means more work for DynamoDB. When you add, delete, or replace items in a table that has local secondary indexes, DynamoDB will use additional write capacity units to update the relevant indexes.

I estimate maximum thousands of types games and I wonder whether it's worth using LSI or whether it's better to use scans for the other operation I choose.

Does anyone has any real experience with such problem? I was not able to find anything on this topic.

Solution

When you are designing DynamoDB tables, the main cost factor comes with IOPS for reads and writes.

This is why avoiding scans are usually better. Scans will consume a significant amount of read IOPS and it will increase with the number of items in the table since scan needs to read all the items in the table before returning the matching items.

Then coming back to your use-case of using SK for progress, it would be better to use attributes and define Secondary Indexes, since you will need to update the state later on (Which is not possible with PK and SK in the table).

So based on your use-case and the information given in the question you can define the schema as;

PK- UserID SK- GameID GSI- Progress (PK)

Query all games by progress fast GSI Progress (PK)

Note: if this is for a particular user; you can change it to LSI Progress.

Query progress of a given game fast (Assuming that for a given user) Query using UserID (PK) and GameID (SK) of the Table