Search code examples
socketssslopenssltcp-repair

Is it possible to "move" an established SSL connection to another server?


Let's assume, you have an 24/7 server running on a linux machine, that handles incoming connections, as well as "plain" TCP as TLS (via OpenSSL). To ensure the service to work, the clients are required to always maintain a connection to this service. Unfortunately, some of those clients don't reconnect immediately when the server closes the connection, so the server tries its best to keep the connection alive forever.

However, if the server needs to get rebooted, e.g. due to maintenance, the connections will be lost.

To avoid disconnecting, I want to "move" established TCP sessions to another machine using the TCP_REPAIR mechanism (https://lwn.net/Articles/495304/). Basically this means to save the TCP socket information from machine A (such as the Syn / Ack numbers), recover the TCP socket information on machine B and ensure that new IP-packets will be sent to the new machine.

This works fairly well with plain TCP without the clients noticing, that the TCP-connection is being redirected to another machine. But when using TLS, this obviously requires some more work.

To simplify, let's assume, that there are no TLS messages on the wire, no SSL_read and SSL_write is pending, and all previous TLS messages were sent and received completely.

What I tried so far:

Approach 1: Silently re-create a new SSL object using the same SSL_SESSION

Try to create a new SSL object, make it "established" and attach it to the fd:

  • On the new machine, create a (temporary) client SSL_CTX, add the SSL_SESSION object from the old machine. As this SSL_SESSION object was taken from the server SSL_CTX on the old machine, this approach is probably completely useless, but I gave it a try nevertheless.
  • Create two new SSL objects (one from the client-ctx (only temporary), one from the server-ctx), connect them via memory BIO, set the SSL_SESSION to the SSL client object.
  • Initiate handshake
  • After the handshake is completed, change the server-SSL object's bio to a BIO_fd using the reincarnated TCP connection
  • Destroy the (no longer needed) memory BIOs and the Client-SSL-object.

This didn't work at all. I always see a full handshake, SSL_session_reused returns 0 for both SSL objects. And even if the SSL_SESSION was reused, I still doubt, that this would be sufficient.

Approach 2: The memcpy approach

This is basically an attempt to create something like "i2d_SSL" and "d2i_SSL" methods.

  • Create a plain SSL-object from the new server's SSL_CTX.
  • Attach the SSL_SESSION from the old server's SSL object
  • Using the internal OpenSSL headers, cast the SSL-object to a SSL_CONNECTION struct
  • copy a few fields from the SSL_CONNECTION struct:
    • the secrets
    • the random fields in the .s3 sub-structure
    • the *_md fields in the .s3.tmp sub-structure
    • the states in the .statem sub-structure
    • ... and a few more (more or less trial-and-error).

I suppose, Approach 2 could work, but it's pretty hard to find out the really relevant fields - if this can work at all.

Can anyone shed some light on this?


Solution

  • I have been asked this question from time to time over the years.

    The answer is that OpenSSL does not support this. An SSL object contains a lot of temporary connection specific state, so approach 1 is doomed to failure. For approach 2, there are many dependent sub objects inside the SSL, e.g. for tracking cipher and hash states hidden behind the top level SSL object. You would need to replicate all of these. All of those objects will have references to the specific libssl/libcrypto instance (such as the loaded provider objects). This is unfortunately also doomed to failure.

    This would be a major feature to add to OpenSSL requiring a lot of effort to make this work. It is simply not possible without major changes to the underlying libraries.