In tutorial https://docs.aws.amazon.com/iot/latest/developerguide/iot-ddb-rule.html it's suggested to use time the sample was recorded as partition key instead of device id.
However looking at https://aws.amazon.com/blogs/database/choosing-the-right-dynamodb-partition-key/ it would seem that device id would make more sense: "Use high-cardinality attributes. These are attributes that have distinct values for each item, like emailid, employee_no, customerid, sessionid, orderid, and so on."
It also says: "In most cases, all items with the same partition key are stored together in a collection, which we define as a group of items with the same partition key but different sort keys." so when using timestamp as partition key messages from same device would be split between different collections. Does this have any significance?
I understand that partition key must be unique so in that sense using device id wouldn't work but when combining partition key with sort key this requirement of uniqueness doesn't apply.
You are correct, it would make much more sense using deviceId as partition key, otherwise your stored data is essentially useless.
Having data grouped by device_id means you can make requests like give me the latest output from a given device, or give me all of the device outputs for the last 30 days.
Using timestamp as the partition key gives you none of that, it can only provide: give me all of the devices who emitted a metric in this second/microsecond or whatever the granularity of the timestamp is.
device_id (PK) | sample_time (SK) | data |
---|---|---|
123 | 2023-01-01T00:00:000Z | device data |
123 | 2023-01-02T00:00:000Z | device data |
123 | 2023-01-03T00:00:000Z | device data |
I would suggest leaving feedback at the bottom of the tutorial, that'll get it to the right team.