Search code examples
amazon-dynamodbamazon-dynamodb-streams

Separate table for time series and stats in a single table design on DynamoDB?


It seems contra-intuitive, but in most of the examples of single table designs I have seen they use a separate table for time series data and summary stats.

Why would we do this when we already have that single table?

In the examples I have seen there is a DynamoDB stream on the main table which updates or adds data to the stats table. Why not just have it push data back to the main table?

Are there any performance considerations?

I know that the same DynamoDB stream would trigger again when the new data gets pushed back, but with the recently released filters for streams, we could tell it to only trigger if the Type is not of "stats" or something along those lines.


Solution

  • Potential reasons to keep them separate, off the top of my head:

    • You want different security rules.
    • You want the summary table in On-demand to handle spiky read traffic while the time series data should be Provisioned to handle smooth write traffic.
    • You want to keep a backup of the summary but not the time series.
    • You want to group the time series data into a different table per time period (i.e. month) and later archive or delete the table
    • You want to put the time series data into the Standard-IA table class.

    Reasons to keep them together:

    • If you want both data sets Provisioned then grouping the access will smooth the spikes and make auto-scaling work better
    • One less thing to manage