I have a table in DynamoDB in the below format:
DeviceId (PK) | SensorDataType | SensorValue | CurrentTime (SK) |
---|---|---|---|
BSMD002 | HeartRate | 86 | 2021-03-13 14:50:17.292663 |
BSMD002 | HeartRate | 106 | 2021-03-13 14:50:17.564644 |
BSMD002 | HeartRate | 97 | 2021-03-13 14:50:17.854391 |
I am pulling the data from this table using boto3 and want to create a new table basis user input ( DeviceId, Date Range).This table will have sensortype wise data - Max,min and avg grouped by minute.
I know DynamoDB doesn't support aggregation and using Streams + Lambda is the more efficient way. But want to understand if there is a way to do this in boto3. So far have worked pulling the data as per below code.
import boto3
import time
from datetime import datetime, timedelta
from boto3.dynamodb.conditions import Key, Attr
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('BSMDataTable')
devicetag = input(" Enter the Device ID to find: ").upper()
datefrom = input("Enter Starting Date in YYYY-MM-DD format: ")
dateto = input("Enter Ending Date in YYYY-MM-DD format: ")
fe = Key('CurrentTime').between(datefrom,dateto) & Key('DeviceId').eq(devicetag);
response = table.query(
KeyConditionExpression=fe
)
for i in response['Items']:
print(i)
You're actually very close. All that's missing is the aggregation of the items from the response.
Here's an example for that. We first group the items by the minute and then calculate the statistics for each minute.
import statistics
import itertools
# Sample data
response = {
"Items": [
{"DeviceId": "BSMD002", "SensorDataType": "HeartRate", "SensorValue": 86, "CurrentTime": "2021-03-13 14:50:17.123"},
{"DeviceId": "BSMD002", "SensorDataType": "HeartRate", "SensorValue": 100, "CurrentTime": "2021-03-13 14:50:18.123"},
{"DeviceId": "BSMD002", "SensorDataType": "HeartRate", "SensorValue": 19, "CurrentTime": "2021-03-13 14:51:17.123"},
]
}
# Group the response by items per minute
items_by_minute = itertools.groupby(
response["Items"],
key=lambda x: x["CurrentTime"][:16] # The first 16 characters including the minute
)
# Calculate the statistics for each minute
for minute, items in items_by_minute:
values_per_minute = [item["SensorValue"] for item in items]
avg = statistics.mean(values_per_minute)
min_value = min(values_per_minute)
max_value = max(values_per_minute)
print(f"Minute: {minute} / Average {avg} / Min {min_value} / Max {max_value}")
Output
Minute: 2021-03-13 14:50 / Average 93 / Min 86 / Max 100
Minute: 2021-03-13 14:51 / Average 19 / Min 19 / Max 19