Search code examples
aws-lambdaamazon-dynamodbamazon-dynamodb-streams

AWS Event Driven Batch w/ Asynchronous Inputs That must be paired


I have data coming into two DynamoDB tables. Let's call them Widgets and Kerfuffles. Each Widget "has a" Kerfuffle, but a Kerfuffle could belong to several Widgets. Now normally, I'd say I could use DDB Streams to kick off a lambda to publish my Widget-Kerfuffle pair to SNS. However, Widgets and their Kerfuffles don't necessarily arrive together. In fact, the Kerfuffle could arrive 5-10 minutes before or after the Widget.

So it would seem like I can't just have a lambda trigger on the Widget or the Kerfuffle being Created because the other half might not be present (and I don't want to send down duplicate Widgets either).

Any suggestions on how to handle this?


Solution

  • Typing is hard. Let widget = A and kerfuffle = B.

    1. Real-time: you process notifications off of new A's and new B's. For each A notification, you check whether B is present. If it isn't stop. Else, process that A. For each B notification, you collect all present A's matching it, and process them all. Note that you'll need some sort of locking here if you want to avoid processing A's multiple times if they trigger very close to their B and both processes succeed.

    2. Near-real-time: once in a while (every t minutes), find all A's that have not been processed. Process all those that have matching B's, and mark those A's as processed.

    Tradeoffs:

    Method 1:

    • You process a bunch of notifications that don't actually matter, because you can't act on A's that don't have B's yet.
    • You add the complexity of processing two separate streams which can interfere with one-another, unless you keep your processing single-threaded.

    Method 2:

    • You delay processing by t minutes. This can be inconsequential or extremely impractical, depending on your application.