Search code examples
amazon-web-servicesaws-lambdaamazon-dynamodbdynamodb-queries

Real time process dynamoDB table with AWS Lambda


I am currently working on a stream of data that comes from an IoT rule and inputs the readings to a dynamoDB table.

From that I need to rearrange the data to make predictions. For every new data entry, every new row I trigger a lambda function, but I need to get the last 96 table rows to manipulate.

My problem is 1- how can I query the table 2- and transform to something familiar like a pandas dataframe?

In the table I have a timestamp column (the format is still open to be decided)


Solution

  • Ultimately it depends on how your data is structured and how you want to access it. But to answer your question, when DynamoDB Streams invokes your Lambda you can use a Query or Scan to retrieve the data from DynamoDB.

    My assumption here is that you need to retrieve the last 96 updates based on timestamp, Scan will not allow you to do that efficiently. You would need to use Query however, that will dictate your data model.

    Depending on your specific use-case needs, I would create a Global Secondary Index with a static partition key such as GSI_PK=1 and your timestamp as the sort key

    gsi_pk gsi_sk data
    1 2023-01-28T07:25:40.824Z data
    1 2023-02-28T07:25:40.824Z data
    1 2023-03-28T07:25:40.824Z data
    1 2023-04-28T07:25:40.824Z data

    Now you can Query your GSI and be sure you are being returned the last 96 items. GSI's are eventually consistent so be aware of that.

       response = table.query(
            IndexName='my-index',
            KeyConditionExpression=Key('gsi_pk').eq('1'),
            Limit=96,
            ScanIndexForward=False
        )