I am currently working on a stream of data that comes from an IoT rule and inputs the readings to a dynamoDB table.
From that I need to rearrange the data to make predictions. For every new data entry, every new row I trigger a lambda function, but I need to get the last 96 table rows to manipulate.
My problem is 1- how can I query the table 2- and transform to something familiar like a pandas dataframe?
In the table I have a timestamp column (the format is still open to be decided)
Ultimately it depends on how your data is structured and how you want to access it. But to answer your question, when DynamoDB Streams invokes your Lambda you can use a Query
or Scan
to retrieve the data from DynamoDB.
My assumption here is that you need to retrieve the last 96 updates based on timestamp, Scan
will not allow you to do that efficiently. You would need to use Query
however, that will dictate your data model.
Depending on your specific use-case needs, I would create a Global Secondary Index with a static partition key such as GSI_PK=1
and your timestamp as the sort key
gsi_pk | gsi_sk | data |
---|---|---|
1 | 2023-01-28T07:25:40.824Z | data |
1 | 2023-02-28T07:25:40.824Z | data |
1 | 2023-03-28T07:25:40.824Z | data |
1 | 2023-04-28T07:25:40.824Z | data |
Now you can Query
your GSI and be sure you are being returned the last 96 items. GSI's are eventually consistent so be aware of that.
response = table.query(
IndexName='my-index',
KeyConditionExpression=Key('gsi_pk').eq('1'),
Limit=96,
ScanIndexForward=False
)