Best Practice for DynamoDB sort key

As I focus on designing DynamoDB with a query-centric approach, I find that the sort key is becoming quite lengthy.

For instance, I plan to use PK as like USER#{UUID} and structure SK as like ONBOARDING#{SCHOOL}#{SCHOOL_ADDRESS}#{MAJOR}#{CREATED_AT}. I believe this structure makes it convenient to query users by {SCHOOL} and also by {SCHOOL_ADDRESS} as well. While there is the option to divide fields and construct GSI, I've heard that minimizing the use of GSI is more cost-effective.

Is there any potential issue with having such a long sort key (excluding size limit)?

Solution

I believe this structure makes it convenient to query users by {SCHOOL} and also by {SCHOOL_ADDRESS} as well.

Incorrect.

When you query DDB, you have to provide the hash (partition) key as an equality and then if you have a sort key you can use one of the following operators.

a = b — true if the attribute a is equal to the value b

a < b — true if a is less than b

a <= b — true if a is less than or equal to b

a > b — true if a is greater than b

a >= b — true if a is greater than or equal to b

a BETWEEN b AND c — true if a is greater than or equal to b, and less than or equal to c.

The following function is also supported:

begins_with (a, substr)— true if the value of attribute a begins with a particular substring.

Note there's no contains.

As it stands, given that USER#{UUID} is basically unique (except for user's who've had more than 1 school/major), you're only going to be able to get data out of the DDB table one record at a time (unless you do a SCAN; which is not recommended).

If you want to query DDB, you need to select a hash key that contains more than 1 record.

Without examples of the data retrieval patterns you need, it's difficult to suggest a good hash/sort key. But unless your design is trivial, you're going to need at least one GSI. Possibly overloaded.

As far as your table, hash key of {SCHOOL} and a sort key of {UUID} would allow you to "query users by school". But of course the order of the returned data would be basically random. You'd probably end up doing all your queries via GSI/LSI. More importantly, it'd break if a user had for instance more than one major at the same school.

Honestly while DDB can be used in instead of a RDBMS, for many applications Aurora is likely a better choice.