Search code examples
amazon-web-servicesamazon-dynamodbboto3dynamodb-queries

Boto3 and DynamoDB - How to mimic Aggregation


I have a table in DynamoDB in the below format:

DeviceId (PK) SensorDataType SensorValue CurrentTime (SK)
BSMD002 HeartRate 86 2021-03-13 14:50:17.292663
BSMD002 HeartRate 106 2021-03-13 14:50:17.564644
BSMD002 HeartRate 97 2021-03-13 14:50:17.854391

I am pulling the data from this table using boto3 and want to create a new table basis user input ( DeviceId, Date Range).This table will have sensortype wise data - Max,min and avg grouped by minute.

I know DynamoDB doesn't support aggregation and using Streams + Lambda is the more efficient way. But want to understand if there is a way to do this in boto3. So far have worked pulling the data as per below code.

import boto3
import time
from datetime import datetime, timedelta
from boto3.dynamodb.conditions import Key, Attr

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('BSMDataTable')

devicetag = input(" Enter the Device ID to find: ").upper()
datefrom = input("Enter Starting Date in YYYY-MM-DD format: ")
dateto = input("Enter Ending Date in YYYY-MM-DD format: ")

fe = Key('CurrentTime').between(datefrom,dateto) & Key('DeviceId').eq(devicetag);
response = table.query(
                KeyConditionExpression=fe
            )
for i in response['Items']:
    print(i)

Solution

  • You're actually very close. All that's missing is the aggregation of the items from the response.

    Here's an example for that. We first group the items by the minute and then calculate the statistics for each minute.

    import statistics
    import itertools
    
    # Sample data
    response = {
        "Items": [
            {"DeviceId": "BSMD002", "SensorDataType": "HeartRate", "SensorValue": 86, "CurrentTime": "2021-03-13 14:50:17.123"},
            {"DeviceId": "BSMD002", "SensorDataType": "HeartRate", "SensorValue": 100, "CurrentTime": "2021-03-13 14:50:18.123"},
            {"DeviceId": "BSMD002", "SensorDataType": "HeartRate", "SensorValue": 19, "CurrentTime": "2021-03-13 14:51:17.123"},
        ]
    }
    
    # Group the response by items per minute
    items_by_minute = itertools.groupby(
        response["Items"], 
        key=lambda x: x["CurrentTime"][:16]  #  The first 16 characters including the minute
    )
    
    # Calculate the statistics for each minute
    for minute, items in items_by_minute:
        values_per_minute = [item["SensorValue"] for item in items]
        
        avg = statistics.mean(values_per_minute)
        min_value = min(values_per_minute)
        max_value = max(values_per_minute)
        
        print(f"Minute: {minute} / Average {avg} / Min {min_value} / Max {max_value}")
    

    Output

    Minute: 2021-03-13 14:50 / Average 93 / Min 86 / Max 100
    Minute: 2021-03-13 14:51 / Average 19 / Min 19 / Max 19