amazon-web-services amazon-dynamodb boto3 dynamodb-queries

Boto3 and DynamoDB - How to mimic Aggregation

I have a table in DynamoDB in the below format:

DeviceId (PK)	SensorDataType	SensorValue	CurrentTime (SK)
BSMD002	HeartRate	86	2021-03-13 14:50:17.292663
BSMD002	HeartRate	106	2021-03-13 14:50:17.564644
BSMD002	HeartRate	97	2021-03-13 14:50:17.854391

I am pulling the data from this table using boto3 and want to create a new table basis user input ( DeviceId, Date Range).This table will have sensortype wise data - Max,min and avg grouped by minute.

I know DynamoDB doesn't support aggregation and using Streams + Lambda is the more efficient way. But want to understand if there is a way to do this in boto3. So far have worked pulling the data as per below code.

import boto3
import time
from datetime import datetime, timedelta
from boto3.dynamodb.conditions import Key, Attr

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('BSMDataTable')

devicetag = input(" Enter the Device ID to find: ").upper()
datefrom = input("Enter Starting Date in YYYY-MM-DD format: ")
dateto = input("Enter Ending Date in YYYY-MM-DD format: ")

fe = Key('CurrentTime').between(datefrom,dateto) & Key('DeviceId').eq(devicetag);
response = table.query(
                KeyConditionExpression=fe
            )
for i in response['Items']:
    print(i)

Solution

You're actually very close. All that's missing is the aggregation of the items from the response.

Here's an example for that. We first group the items by the minute and then calculate the statistics for each minute.

import statistics
import itertools

# Sample data
response = {
    "Items": [
        {"DeviceId": "BSMD002", "SensorDataType": "HeartRate", "SensorValue": 86, "CurrentTime": "2021-03-13 14:50:17.123"},
        {"DeviceId": "BSMD002", "SensorDataType": "HeartRate", "SensorValue": 100, "CurrentTime": "2021-03-13 14:50:18.123"},
        {"DeviceId": "BSMD002", "SensorDataType": "HeartRate", "SensorValue": 19, "CurrentTime": "2021-03-13 14:51:17.123"},
    ]
}

# Group the response by items per minute
items_by_minute = itertools.groupby(
    response["Items"], 
    key=lambda x: x["CurrentTime"][:16]  #  The first 16 characters including the minute
)

# Calculate the statistics for each minute
for minute, items in items_by_minute:
    values_per_minute = [item["SensorValue"] for item in items]
    
    avg = statistics.mean(values_per_minute)
    min_value = min(values_per_minute)
    max_value = max(values_per_minute)
    
    print(f"Minute: {minute} / Average {avg} / Min {min_value} / Max {max_value}")

Output

Minute: 2021-03-13 14:50 / Average 93 / Min 86 / Max 100
Minute: 2021-03-13 14:51 / Average 19 / Min 19 / Max 19