Search code examples
mysqlamazon-web-servicesasp.net-coreamazon-rdsamazon-ebs

Pomelo MySQL (.NET Core) Can't Recover After Database Failure


Last night AWS RDS had an "Internet Connectivity Issue" that was resolved a short time later. However, my app (which runs in .NET Core and connects to an RDS MySQL instance via Pomelo.EntityFrameworkCore.MySql) could never re-establish a connection to the database even though the MySQL server was back online. I tested connecting from my own local machine and it worked just fine. I then re-deployed the .NET Core app it everything started working again.

Is there something that I need to re-create (the db context perhaps), or is there something that is being cached that I need to flush to try to connect again? I connect via hostname, and my connection string looks something like this:

server=something.somewhere.us-east-2.rds.amazonaws.com;userid=XXXX;password=YYYYY;database=ZZZZ

Here is the Exception being thrown:

MySqlException: Unable to connect to any of the specified MySQL hosts.
at MySqlConnector.Core.ServerSession+<ConnectAsync>d__56.MoveNext (C:\projects\mysqlconnector\src\MySqlConnector\Core\ServerSession.cs:239)

and here is how I create my db context in Startup.cs:

services.AddDbContext<BlayFapContext>(opt => opt.UseMySql(Settings.Instance.SQLConnectionString));

Any help would be greatly appreciated.

Giawa


Solution

  • Okay, we worked out what happened. Pomelo's MySQL wrapper had an issue as outlined in their git repo here: https://github.com/PomeloFoundation/Pomelo.EntityFrameworkCore.MySql/issues/434

    Basically, if a MySQL database is not available when the connection string is first used then it will be cached as invalid and will never work again. You can easily confirm this by launching a service with no MySQL connectivity, verify it doesn't work, then launch MySQL and confirm that the service still doesn't work. It can never establish a MySQL connection after the first connection string is found to be invalid.

    They patched it shortly after the 2.0.1 release, but they haven't updated Nuget with a new version since then, despite the issue being found 6 months ago. So, the fix is to checkout their repository source code, and patch it ourselves. We found the fix here works just fine: https://github.com/PomeloFoundation/Pomelo.EntityFrameworkCore.MySql/pull/456

    So, why was the connection string retried? We already had a successful connection! It turns out that the internet connectivity issue with the Ohio data center was not limited to RDS, but also affected EC2. Our EC2 instance was rebooted as part of the fix, and the MySQL connection wasn't valid when it reboot due to the continued connectivity issues. The state of that connection was cached, and even though the MySQL server came back online our service was toast.

    Giawa