so I've got this data of a type 'list' (this is just a fraction of it to illustrate the point) stored in a variable called sorted_trans (I have pre-sorted it to group accountIds together):
Transaction(transactionId='T000664', accountId='A1', transactionDay=21, category='GG', transactionAmount=364.69), Transaction(transactionId='T000776', accountId='A1', transactionDay=24, category='GG', transactionAmount=329.63), Transaction(transactionId='T000313', accountId='A10', transactionDay=8, category='AA', transactionAmount=960.56), Transaction(transactionId='T000472', accountId='A10', transactionDay=12, category='AA', transactionAmount=707.74), Transaction(transactionId='T000596', accountId='A10', transactionDay=18, category='AA', transactionAmount=156.77), Transaction(transactionId='T000730', accountId='A10', transactionDay=23, category='AA', transactionAmount=577.76)
My task is to sort this data by 'accountId' for the last 5 days (I'm using the rolling window) and calculate: total 'transactionAmount' per 'accountId', average 'transactionAmount' per 'accountId' and return the maximum 'transactionAmount' per 'accountId'
To capture data for the last five days (excluding the current day), I'm using a normal for-loop
for i in range(day-window_size, day):
I suspect I would need to use some sort of a comprehension to group and calculate the right values. The output should contain one line per day per account id and each line should contain each of the calculated statistics, for example: Output examle. I've managed to get everything but the average and max.
For example, I've calculated total value per category AA (same for other categories) with this code:
trans_AA = []
trans_AA.append([(x.accountId, x.transactionAmount) for x in sorted_trans if x.category == "AA"])
Generic method to calculate total transactionAmount per category
def totals_per_cat(transactions):
for accountId, transactionAmount in chain.from_iterable(transactions):
d[accountId] += transactionAmount
trans_res = list(d.values())
return trans_res
How to calculate average and max per accountId for the past 5 days?
This will hopefully get you most of the way there:
from collections import defaultdict
from dataclasses import dataclass, fields
from csv import DictReader
from statistics import mean
from typing import Dict, List
@dataclass
class Transaction:
transactionId: str
accountId: str
transactionDay: int
category: str
transactionAmount: float
def load_transactions(file_path: str) -> List[Transaction]:
with open(file_path) as csv_file:
return [
Transaction(**{
field.name: field.type(row[field.name])
for field in fields(Transaction)
}) for row in DictReader(csv_file)
]
transactions = load_transactions('transactions.txt')
min_day = min(xact.transactionDay for xact in transactions)
max_day = max(xact.transactionDay for xact in transactions)
window_size = 5
xact_by_acct: Dict[str, List[Transaction]] = defaultdict(list)
for xact in transactions:
xact_by_acct[xact.accountId].append(xact)
for day in range(min_day + window_size, max_day + 1):
for acct, xacts in xact_by_acct.items():
amts = [
xact.transactionAmount for xact in xacts
if xact.transactionDay in range(day - window_size, day)
] # all amounts within the window
try:
print(
f"Day: {day}\tAccount: {acct}\t"
f"Max: {max(amts)}\tAverage: {mean(amts)}"
)
except ValueError:
# No transactions in the window for this account.
pass