Hi I am attempting to build a fintech app and am using a template I found here: https://github.com/aws-samples/aws-plaid-demo-app
I am looking to modify the data base to better fit my needs and had a few questions about best practices for implementation before I start making major changes.
The demo app seems to combine all of the data into a single table, even though it is storing different things with different pks/sks and data schemas. Is this ok to do in production? I will admit it is a bit confusing when trying to read the table in the console, but I am trying to minimize deviations from the demo app (and therefore large rewrites) where possible. Should I set up different tables for different items?
I need to add a few features to my app. The first is a sort of truth table that contains information about different credit cards and is the same for all users. Should I add this information directly into the same Dynamo table? How can I easily edit/update this information and allow all users the ability to query it?
The main query I will be making will be for transactions, and these need to be able to be searchable based on transaction name, merchant name, card, type, date, and category (which are all stored within the transaction table). Do I need to create different indexs for this? For the graph I need to be able to sort by card and category (and set date range) very quickly. Would love any advice on this.
Here is the templat.yml that shows table info:
Table:
Type: "AWS::DynamoDB::GlobalTable"
UpdateReplacePolicy: Delete
DeletionPolicy: Delete
Properties:
AttributeDefinitions:
- AttributeName: pk
AttributeType: S
- AttributeName: sk
AttributeType: S
- AttributeName: gsi1pk
AttributeType: S
- AttributeName: gsi1sk
AttributeType: S
BillingMode: PAY_PER_REQUEST
GlobalSecondaryIndexes:
- IndexName: GSI1
KeySchema:
- AttributeName: gsi1pk
KeyType: HASH
- AttributeName: gsi1sk
KeyType: RANGE
Projection:
ProjectionType: ALL
KeySchema:
- AttributeName: pk
KeyType: HASH
- AttributeName: sk
KeyType: RANGE
Replicas:
- PointInTimeRecoverySpecification:
PointInTimeRecoveryEnabled: true
Region: !Ref "AWS::Region"
TableClass: STANDARD
Tags:
- Key: GITHUB_ORG
Value: !Ref GitHubOrg
- Key: GITHUB_REPO
Value: !Ref GitHubRepo
- Key: Environment
Value: !Ref Environment
SSESpecification:
SSEEnabled: true
StreamSpecification:
StreamViewType: NEW_AND_OLD_IMAGES
TimeToLiveSpecification:
AttributeName: expire_at
Enabled: true
I am a CS undergrad (and have taken a DB class), but feel like I could really use some advice on implementation strategies for building a real product.
Thank you!!!!
You are free to set up as many tables as you like. Having multiple entities in a single table is only useful when you are required to read them in a single request, if you don't intend to read the data together, then go with multiple tables to reduce complexity.
This is the same as above, use the same table if you are going to read it along with other entities. DynamoDB is schemaless which allows you to add items any time you need, you don't have to define their attributes in advance.
Do you actually need to read based on all of those attributes? If you do, then yes you need an index for each of them. Or you can consider replicating data into Opensearch. Alternatively you can use a relational database.
My advice is to use NoSQL Workbench and design your data model there. It comes with DynamoDB Local where you can test your application against for free, only then will you know if your model is efficient. Modelling data for NoSQL is an iterative process that takes some time to master, stick with it.