Search code examples
c#.netnetworkingtcplistenersslstream

C# TcpListener and MySqlConnection stops accepting connections after a while


I have an async socket server written in C#, running on a Lightsail server running Amazon Linux. It consists of a TcpListener that accepts connections, starts up a new thread to listen when someone connects, initiates an SSL connection, and then acts as a server for an online game.

This server works fine for about a day, until suddenly all networking stops working on the server. The crash takes anywhere from 22 hours to one week to occur. The symptoms are as follows:

  1. Anyone already connected to the server will suddenly stop receiving/sending data. I can see in the logs that my inactivity checking code will eventually kick them for not sending heartbeat packets.
  2. The server will also be unable to connect to its MySQL database (which is running on the same system, so it's unable to connect to localhost? I can still access it through PHPMyAdmin during this time).
  3. It is, however, still able to write both to files and to console, as my logger is still able to write to both.

The code looks like everyone else's (I did try the changes suggested for this question, but it still crashed after ~24 hours). None of the errors get logged, so it looks like it never encounters an exception. No exceptions precede the crash, which is why I've been having problems figuring this one out.

For completeness, here is my main loop:

public void ListenLoop()
{
    TcpListener listener = new TcpListener(IPAddress.Any, 26000);
    listener.Start();

    while (true)
    {
        try
        {
            if (listener.Pending())
            {
                listener.BeginAcceptTcpClient(new AsyncCallback(AcceptConnection), listener);
                Logger.Write(Logger.Level.INFO, "continuing the main loop");
            }
            // Yield so we're not stuck in a busy-loop
            Thread.Sleep(5);
        }
        catch (Exception e)
        {
            Logger.Write(Logger.Level.ERROR, $"Error while waiting for listeners: {e.Message}\n{e.StackTrace}");
        }
    }
}

and here are the accept parts:

/// <summary>
/// Finish an async callback but spawn a new thread to handle it if necessary
/// </summary>
/// <param name="ar"></param>
private void AcceptConnection(IAsyncResult ar)
{
    if (ar.CompletedSynchronously)
    {
        // Force the accept logic to run async, to keep our listening
        // thread free.
        Action accept = () => AcceptCallback(ar);
        accept.BeginInvoke(accept.EndInvoke, null);
    } else
    {
        AcceptCallback(ar);
    }
}

private void AcceptCallback(IAsyncResult ar)
{
    try
    {
        TcpListener listener = (TcpListener) ar.AsyncState;
        TcpClient client = listener.EndAcceptTcpClient(ar);
        // If the SSL connection takes longer than 5s we have a problem, and should stop
        client.Client.ReceiveTimeout = 5000;

        // Attempt to get the IP address of the client we're connecting to
        IPEndPoint ipep = (IPEndPoint)client.Client.RemoteEndPoint;
        string ip = ipep.Address.ToString();
        Logger.Write(Logger.Level.INFO, $"Connection begun to {ip}");

        // Authenticate and begin communicating with the client
        SslStream stream = new SslStream(client.GetStream(), false);
        try
        {
            stream.AuthenticateAsServer(
                serverCertificate,
                enabledSslProtocols: System.Security.Authentication.SslProtocols.Tls12,
                clientCertificateRequired: false,
                checkCertificateRevocation: true
                );

            stream.ReadTimeout = 3600000;
            stream.WriteTimeout = 3600000;

            NetworkPlayer player = new NetworkPlayer();
            player.Name = ip;
            player.Connection.Stream = stream;
            player.Connection.Connected = true;
            player.Connection.Client = client;
            stream.BeginRead(player.Connection.Buffer, 0, 1024, new AsyncCallback(ReadCallback), player);
        }
        catch (Exception e)
        {
            Logger.Write(Logger.Level.ERROR, $"Error while starting the connection to {ip}: {e.Message}");
            // The following code just calls stream.Close(); and client.Close(); but sends exceptions to my logger.
            CloseConnectionSafely(client, stream);
        }
    }
    catch (Exception e)
    {
        Logger.Write(Logger.Level.ERROR, $"Error while starting a connection to an unknown user: {e.Message}");
    }
}

Solution

  • The solution I found after consulting some people more familiar with C# than me is that I was running into Thread Pool Exhaustion. Essentially, I had a bunch of other async tasks (not shown in the code in the question, as they didn't look like they could cause what I was seeing) that were stuck executing some extremely-long-IOs (talking to users that had either disconnected improperly or were behind very high latency), which prevented the async AcceptCallback in my post from being picked up by the Thread Pool. This had a myriad of other side-effects which I outlined in the question:

    1. Creating a new connection to a MySQL database involves an async task behind-the-scenes, which was being starved out due to exhaustion.
    2. Completing the EndAcceptTcpClient required my async task to run, which requires an available thread.
    3. Tasks which did not involve the async keyword, such as Timer() bound tasks (like my logger I/O) were unaffected and could still run.

    My solution involved reducing the number of synchronization steps elsewhere in my program, and restructuring any tasks that could take a long time to execute so that they didn't block threads. Thank you to everyone who looked/commented.