Search code examples
amazon-web-servicesaws-aurora-serverless

How to connect an on-premises application to AWS Aurora Serverless


We have a bunch of on-premises applications each running their own local MySQL servers. Our workload is light, with occasional bursts of activity (a B2B business model with some specific times of the month in which it is more profitable to use our application, and therefore we see usage spikes during those days). We decided that it would be a good idea to simplify the infrastructure by moving all the databases into one server/cluster, and after some discussion decided that buying a managed solution would be better than trying to set up and maintain our own MySQL cluster (none of us are DBAs).

We did a thorough amount of research, and eventually settled on Amazon Aurora Serverless as a solid candidate for its auto-scaling capabilities, and therefore (potentially) lower cost compared to the alternatives we examined (AWS MySQL RDS and DigitalOcean managed MySQL), due to our usually-light workload with occasional bursts of activity.

However, from what I can gather it is impossible to just simply connect to AWS Aurora Serverless (see Not able connect Amazon Aurora Serverless from SQL client for example) from our on-premises applications, so my question is:

  1. What is the best-practice, modern way to solve this problem - should we use a site-to-site VPN to connect our on-premises hosts to the cloud? Would this end up costing us significantly more?
  2. Is Aurora Serverless really the best solution at all, or should we fall back to Amazon RDS, or DigitalOcean's managed MySQL cluster, both of which allow assigning public IPs but neither of which will auto-scale (meaning we'd need to buy a tier based on our peak usage, and potentially waste a lot of money as it will sit almost idle for a large part of the month)?

What we want to achieve is a simple, fire-and-forget MySQL cluster set up that's managed by someone else, ideally auto-scales, and doesn't cost the earth or end up being more difficult to manage than the current, on-premises solution.

We are not cloud-averse, but neither do we want to suddenly start moving everything into the cloud all at once just for the sake of a simpler database infrastructure.

To throw an extra spanner into the works, we don't manage our own firewalls - so setting up a site-to-site VPN could be tricky and involve coordinating with a third party (our network provider). Ideally I'd like to avoid this hassle too, if at all possible.


Solution

  • I understand that you have some questions around hybrid cloud architectures with regard to Amazon Aurora Serverless. This is a really tough topic and might easily be seen as opinionated (luckily the community left this open though). So, I try to reference as much public material as possible and try to explain my thoughts if I had to design this kind of setup.

    As a disclaimer, I am not an AWS official. However, I was building and operating cloud applications in the startup industry for the last three years... And coincidentally I have a couple of minutes, so here are my thoughts:

    1. Problem Statement

    Aurora Serverless is accessbile through VPC Interface Endpoints [1]:

    Each Aurora Serverless DB cluster requires two AWS PrivateLink endpoints. If you reach the limit for AWS PrivateLink endpoints within your VPC, you can't create any more Aurora Serverless clusters in that VPC.

    According to the docs [1], as you already pointed out correctly, these endpoints are a private construct:

    You can't give an Aurora Serverless DB cluster a public IP address. You can access an Aurora Serverless DB cluster only from within a virtual private cloud (VPC) based on the Amazon VPC service.

    2. Question Scope

    Your questions involve the best-practices (Q1), the cost aspects (also Q1) and the functional differences to other database options in the cloud (Q2), e.g. public access via the internet and auto scaling.

    These are all valid questions when migrating database workloads into the public cloud. But at the same time, they are only a subset of questions that should be considered.
    As far as I understand, we have three challenges here that should be clearly highlighted: You are (CI) initiating a migration to the cloud, (CII) you are about to modify your existing workload to be a hybrid workload and (CIII) you are performing a database migration. All three are generally big topics on their own and it should not be decided upon them prematurely. However, if your workload is, as you described "light", the risk of doing them all together might be acceptable. That is not something that I am able to discuss in the following.

    So let's focus on the very basic question which comes into my mind when I look at challenges (C1) - (C3) described above:

    3. Is a hybrid workload acceptable? (C2)

    I think the main question you should ask yourself is whether the on-premise workload can be transformed into a hybrid workload. Consequently you should think about the impact of placing your database far away from your clients with regard to latency and reliability. Furthermore you should evaluate if the new database engine fits your performance expectations (e.g. scaling up fast enough for peek traffic) [3] and whether database compatibility and limitations are acceptable [4].

    Usually a connection into the cloud (possibly over an external network carrier) is less reliable than a bunch of cables on-premises. Maybe your workload is even that small, that the DB and its clients are running on the same hypervisor/machine. In that case, moving things far apart (connected over a 3rd party network), should definitely be considered carefully.

    It is a fact, that for a workload to be reliable and/or highly available, not only Aurora has to meet these standards (which it does), but your network connection too.

    When you ask yourself the right questions, you automatically start to characterise your workload. AWS published a bunch of public guidelines to aid you in this process.
    There is the Well Architected Framework [10] and the Well-Architected Tool [11] - the latter one being the "automated" way to apply the framework. As an example, the Reliability Pillar [9] contains some thoughts and expertise from AWS experts to really question your hybrid approach.

    Moreover, AWS publishes so called Lenses [13] to discuss specific workload types from the well-architected perspective. As you asked for the best-practices (Q1), I want to point out that currently there is no detailed guideline/lens for the type of workload you described.

    However, there is an Aurora guide called "Performing a Proof of Concept with Amazon Aurora" in the docs [12]. (more information below in section "Aurora POC Guide")

    I worked on applications in the past which use the database layer heavily and thus could not undergo a change like that without a major refactoring...
    Which brings me to the second point: Migration Strategy.

    4. What is the acceptable migration strategy? (C1)

    Since this is a database migration, there are two major questions you should ask yourself: (a) to what degree do you want to migrate (called the 6R's of migration - a general concept which is independent from databases) and (b) how to lift the database parts into the cloud (especially data). I do not want to go into detail here since it is highly dependent on your workload characteristics.

    AWS has published a detailed guideline which aids you with these decisions. [15]
    It mentions some useful tools such as the DMS and SCT which help you to convert your schema properly (if necessary) and to move your data from source database cluster into target database cluster (optionally in a "online"/"live" migration manner without downtime).

    I want to highlight once again that there is a major decision you have to make: replatforming vs. rearchitecting the application (i.e. the database clients) I guess you can make Aurora Serverless work with only a small amount of changes, but in order to take full advantage of Aurora capabilities, probably a rearchitecting is necessary (which will maybe end in moving the whole workload into the cloud anyway).

    If you decide to do a partial refactoring of your application, you could use the so called Data API as well. The Data API for Aurora Serverless [7][8] makes it possible to send queries directly over the public internet. It might be a valid fit for you if (i) you can afford to refactor some parts of your application code and (ii) your application's characteristics fit the Data API. The Data API has a completely new approach to database connection management and thus suits some serverless use cases very well. However, this might not apply to some traditional database workloads with long-hold / heavily used connections. You should also note the database engine compatibility for Data API ("Availability of the Data API" [12]).

    5. Decision Making

    I think technically it should be no issue to access Aurora Serverless. You have bascially four connectivity options: (a) directly over the internet, (b) over an AWS managed (site-to-site) VPN connection, (c) over an EC2 instance based VPN connection and (d) over Direct Connect (abbreviated DX).

    • Option (a) is only possible if you rearchitect your application to work with the Data API AFAIK.
    • Option (d) should be supported but is the most expensive according to fixed costs. It should be supported because AWS Interface Endpoints (the entry points into Aurora Serverless) are accessbile via DX.
    • Option (c) should be supported according to experts here on SO. [19]
    • Option (b) was certainly not supported at the beginning - but as far as I understand, could be now. This is because AWS PrivateLink (the technology underpinning AWS Interface Endpoints) supports connections from on-premises via AWS managed VPN since September 2018. [17]

    Additionally, you possibly have to forward DNS queries from on-premises into the cloud in order to resolve the VPC Interface Endpoints properly. [18]

    You should characterise your workload, specify the minimal requirements with regard to security, reliability, performance (see Well-Architected Framework) and finally look at the most cost-effective approach to accomplish it. In a B2B model, I would not compromise these three to achieve cost reduction (see my opinion in the section below).

    You have basically two options to decide:

    1. doing the work on your own (which is hopefully a bit easier with the material referenced in this post)
    2. asking AWS or an external company for help from an AWS Solutions Architect

    This is purely a tradeof between (1) the time it takes to figure all this out and get it working, (2) the costs (i.e. operating costs for the implemented solution and costs for consultation), (3) the financial risk involved when something goes wrong during the migration.

    As you state in the question "moving everything into the cloud", I guess you are at the beginning of the cloud journey. The official AWS papers state the following for companies in that situation:

    If your business is new to AWS, consider a managed service provider, such as AWS Managed Services, to build out and manage the platform. [14]

    Having a background from the startup industry, I understand that this is not an option by any means for smaller companies - but just wanted to mention that the option exists.

    6. Conclusion / My Opinion(!)

    Exposing a database to the internet is a practice best avoided. That is not just my own opinion, but those of other's here on SO too. [19]

    I would try to go (as a bare minimum!) with the AWS managed VPN approach and setting up a redundant VPN connection between on-premises and the cloud.

    Why do I state "bare minumum"?
    Because a proper solution would probably be, to move the whole workload into the cloud. However, if this is not possible, I would try to reduce the risk involved in establishing a hybrid workload. A managed VPN connection is probably the most cost-effective way for small workloads to reduce the risk from a security perspective.

    From my experience:
    For the last three years, I operated a SaaS application which was fully built in the AWS cloud. We had several outages of our network carrier since then. I would never trust them enough to establish some sort of hybrid architecture. Not for the type of workload we are offering (SaaS Webapp in B2B sector) and the internet contract/connectivity we have ATM. Never. However, the situation might be a completely different one for you - especially if you are already hosting services from your datacenter/office without reliability issues for a long time.

    If you read until here, you probably ask yourself why someone would ever want to write such an essay. Well, I am just preparing for the AWS Certified Database Specialty [20] and this is a good opportunity to do some serious reasearch, take some notes and collect some sources/references. I want to endorse the various AWS Certification Paths [16] and the eco system of learning platforms around it. There is so much very informative stuff published by AWS.

    Hopefully you found something interesting in this post for yourself too.

    A. Aurora POC Guide

    The guide mentions that when doing a database migration to Aurora, one should consider to:

    • rewrite some parts of the client application code - especially to properly use the DNS endpoints [5][6] and the connection pooling [5]

    • do a schema conversion if migrating from a rather complex (proprietary) source DB engine ("Port Your SQL Code" [12])

    • (optionally) incorporate some Aurora-specific changes to make the migrating more effective (applicable to a Rearchitect type of migration) [2]:

      • To take full advantage of Aurora capabilities for distributed parallel execution, you might need to change the connection logic. Your objective is to avoid sending all read requests to the primary instance. The read-only Aurora Replicas are standing by, with all the same data, ready to handle SELECT statements. Code your application logic to use the appropriate endpoint for each kind of operation. Follow these general guidelines:
      • Avoid using a single hard-coded connection string for all database sessions.
      • If practical, enclose write operations such as DDL and DML statements in functions in your client application code. That way, you can make different kinds of operations use specific connections.
      • Make separate functions for query operations. Aurora assigns each new connection to the reader endpoint to a different Aurora Replica to balance the load for read-intensive applications.
      • For operations involving sets of queries, close and reopen the connection to the reader endpoint when each set of related queries is finished. Use connection pooling if that feature is available in your software stack. Directing queries to different connections helps Aurora to distribute the read workload among the DB instances in the cluster.

    References

    [1] https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless.html#aurora-serverless.limitations
    [2] https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-poc.html#Aurora.PoC.Connections
    [3] https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-poc.html#Aurora.PoC.Measurement
    [4] https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless.html#aurora-serverless.limitations
    [5] https://d1.awsstatic.com/whitepapers/RDS/amazon-aurora-mysql-database-administrator-handbook.pdf
    [6] https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Connecting.html
    [7] https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/data-api.html
    [8] https://www.youtube.com/watch?v=I0uHo4xAIxg#t=12m30s
    [9] https://d1.awsstatic.com/whitepapers/architecture/AWS-Reliability-Pillar.pdf
    [10] https://aws.amazon.com/architecture/well-architected/
    [11] https://aws.amazon.com/de/well-architected-tool/
    [12] https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-poc.html
    [13] https://aws.amazon.com/blogs/architecture/well-architected-lens-focus-on-specific-workload-types/
    [14] https://d1.awsstatic.com/whitepapers/Migration/aws-migration-whitepaper.pdf
    [15] https://docs.aws.amazon.com/prescriptive-guidance/latest/database-migration-strategy/database-migration-strategy.pdf
    [16] https://aws.amazon.com/training/learning-paths/
    [17] https://aws.amazon.com/about-aws/whats-new/2018/09/aws-privatelink-now-supports-access-over-aws-vpn/
    [18] https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/resolver-forwarding-inbound-queries.html
    [19] https://stackoverflow.com/a/52842424/10473469
    [20] https://aws.amazon.com/de/certification/certified-database-specialty/