Search code examples
amazon-dynamodbserverlessaws-serverlessdynamodb-queriesamazon-dynamodb-index

Fetching parent and child item in single query in DynamoDB


I have the following one-to-many relationship:

Account 1--* User

The Account contains global account-level information, which is mutable.

The User contains user-level information, which is also mutable.

When the user signs-in, they need both Account and User information. (I only know the UserId at this point).

I ideally want to design the schema such that a single query is necessary. However, I cannot determine how to do this without duplicating the Account into each User and thus requiring some background Lambda job to propagate changes to Account attributes across all User objects -- which, for the record, seems like more resource usage (and code to maintain) than simply normalizing the data and having 2 queries on each sign-in: fetch user, then fetch account (using an FK inside the user object that identifies the account).


Is it possible to design a schema that allows one query to fetch both and doesn't require a non-transactional background job to propagate updates? (Transactional batch updates are out of the question, since there's >25 users.) And if not, is the 2-query idea the best / an acceptable method?



Solution

  • I'll focus on one angle in your question - the 2-query idea. In many cases it is indeed an acceptable method, better than the alternatives. In fact in many NoSQL uses, every user-visible request results in significantly more than two database requests. In fact, it is often stated that this is the reason why NoSQL systems care about low tail latencies (i.e., even 99th percentile latencies should be low).

    You didn't say why you wanted to avoid the 2-query solution. The 2-query implementation you presented has two downsides:

    1. It is more costly: you need to do two queries instead of one, costing (when the reads are shorter than 4 KB) double than a single read.
    2. Latency doubles if you need to do the first query, and only then can do the second query.

    There may be tricks you can use to solve both problems, depending on more details of your use case:

    For the latency: You didn't say what is a "user id" in your application. If it is some sort of unique numeric identifier, maybe it can be set up such that the account id can be determined from the user id directly, without a table lookup (e.g., the first bits of the user id are the account id). If this is the case, you can start both lookups at the same time, and not double the latency. The cost will still be double, but not the latency.

    For the cost: If there is a large number of users per account (you said there are more than 25 - I don't know if it's much more or not), it may be useful to cache the Account data, so that not every user lookup will need to read the Account data again - it might often be cached. If Account information rarely changes and consistency of it is not a big deal (I don't know if it is...), you can also get by with doing an "eventual consistency" read for the Account information - which costs half of the regular "consistent" read.