Search code examples
amazon-web-servicesaws-lambdaamazon-dynamodbamazon-rdsserverless

AWS DynamoDB vs RDS for Lambda serverless architecture


I am part of a team currently developing a Proof of Concept architecture/application for a communication service between governmental offices and the public (narrowed down to the health-sector for now). The customer has specifically requested a mainly serverless approach through AWS services, and I am in need of advice for how to set up this architecture, namely the Lambda to Database relationship.

Roughly, the architecture would make use of API Gateway to handle requests, which would invoke different Lambdas, as micro-services, that access the DB.

The following image depicts a quick relationship schema. Basically, a Patient inputs a description of his Condition which forms the basis for a Case. That Case is handled during one or many Sessions by one or many Nurses that take Notes related to the Case. DB Schema (not enough reputation)

From my research, I've gathered that in the case of RDS, there is a trade-off between security (keeping the Lambdas outside of a public VPC containing an RDS instance, foregoing security best-practices, a no-no for public sector) and performance (putting the Lambda in a private VPC with an RDS instance, and incurring heavy cold-start times due to the provisioning of ENI). The cold-start times can however be negated by pinging them with CloudWatch, which may or may not be optimal.

In the case of DynamoDB, I am personally very inexperienced (more so than in MySQL) and unsure of whether the data is applicable to a NoSQL model. If it is, DynamoDB seems like the better approach. From my understanding though, NoSQL has less support for complex queries that involve JOINs etc. which might eliminate it as an option.

It feels as if SQL/RDS is more appropriate in terms of the data/relations, but DynamoDB gives less problems for Lambda/AWS services if a decent data model is found. So my question is, would it be preferable to go for a private RDS instance and try to negate the cold-starts by warming up the most critical Lambdas, or is there a NoSQL model that wouldn't cause headaches for complex queries, among other things? Am I missing any key aspects that could tip the scale?


Solution

  • Let's start by clearing up some rather drastic misconceptions on your part:

    From my research, I've gathered that in the case of RDS, there is a trade-off between security (keeping the Lambdas outside of a public RDS instance, foregoing security best-practices, a no-no for public sector) and performance (putting the Lambda in a private RDS instance, and incurring heavy cold-start times). The cold-start times can however be negated by pinging them with CloudWatch, which may or may not be optimal

    1. RDS is a database server. You don't run anything inside or outside of it.
    2. You may be thinking of a VPC, or Virtual Private Cloud. This is an isolated network in which you can run your RDS instances and Lambdas.
    3. Running inside or outside of a VPC has no impact on cold start times. You pay the cold start penalty when AWS has to start a new container to run your Lambda. This can happen either because it hasn't been running recently, or because it needs to scale to meet concurrent requests. The actual cold start time will depend on your language: Java is significantly slower than Python, for example, because it needs to start the JVM and load classes before doing anything.

    Now for your actual question

    Basically, a Patient inputs a description of his Condition which forms the basis for a Case. That Case is handled during one or many Sessions by one or many Nurses that take Notes related to the Case.

    This could be implemented in a NoSQL database such as DynamoDB. Without more information, I would probably make the Session the base document, using case ID as partition key and session ID as the sort key. If you don't understand what those terms mean, and how you would structure a document based around that key, then you probably shouldn't use DynamoDB.

    A bigger reason to not use DynamoDB has to do with access patterns. Will you ever want to find all cases worked by a given nurse? Or related to a given patient? Those types of queries are what a relational database is designed for.

    the case of DynamoDB, I am personally very inexperienced (more so than in MySQL)

    Do you have anyone on your team who is familiar with NoSQL databases? If not, then I think you should stick with MySQL. You will have enough challenges learning how to use Lambda.