Search code examples
c++socketssslopenssl

How to read more than 16'384 bytes of data using OpenSSL?


I am trying to write a HTTPS web server in C++ using the OpenSSL library.

I started from the Simple TLS Server example on OpenSSL's website and I came up to a point where I am unable to figure out how to read more than 16'384 bytes of data.

All the code is the same as in the Simple TLS Server example, here's the part where I read the data:

if (SSL_accept(ssl) <= 0) {
    ERR_print_errors_fp(stderr);
} else {
    while (true) {
        char buffer[16'384];
        const auto bytes = SSL_read(ssl, buffer, 16'384);

        const auto ssl_error = SSL_get_error(ssl, bytes);
        const auto pending = SSL_pending(ssl);

        std::cout << bytes << std::endl; // This is always 16'384 when the requests exceeds 16 kB
        std::cout << ssl_error << std::endl; // This is always 0 (SSL_ERROR_NONE)
        std::cout << pending << std::endl; // This is always 0 since we read a full record of data and there's no more pending data in the record
        // How to know if there is more data to read?

        if (ssl_error == SSL_ERROR_NONE && pending == 0) {
            // This condition will always be true (see previous comments)
            break;
        }
    }

    const auto response = "HTTP/1.1 200 OK\r\nContent-Length: 12\r\n\r\nHello World!";
    SSL_write(ssl, response, strlen(response));
}

I am making requests using Postman or Firefox to test this, in both cases, the condition to exit the loop is true, even if the request exceeds 16kB or not.

After reading a full record, I want to know if there's another record to read the rest of the request using a loop.

As the comments in my code imply, I've done some research:

  • Especially this comment which explains with details how to properly read the request, which I've seen in an other official example.

  • I've also tried making the socket async, but SSL_read seems to stay in sync.

  • I've came across this example which seems to be a quite simple usage of BIOs, but I haven't managed to get it to work either.

  • Of course, I've also seen this post which put me in the direction of why I cannot read more than 16 kB, but with no real solution to the problem.

Is using BIO the way to go? if so, how? I don't really understand BIOs. Or is there a solution without the use of BIO?


Solution

  • SSL_pending only returns the number of data inside the SSL object which are available for immediate read. It says nothing about data still in incompletely received TLS records (which need more data for decryption) nor data buffered in the OS socket buffer nor data still in transit nor data not yet transmitted by the sender.

    Thus, you cannot rely on SSL_pending as the condition to detect end of data and to exit the loop. Instead call SSL_read again. Only if this returns equal or less than 0 then there is no more to read inside the SSL session (with non-blocking sockets you have to check for temporary errors first). Note that you cannot get multiple "messages" inside a TCP/TLS connection this way since there is no message semantic there - it is only a byte stream. To distinguish between multiple messages you need to implement a message semantic on top of the byte stream.

    Specifically in your case of HTTP/HTTPS the client will not close/shutdown the connection after it is done sending the request. It will just stop sending because it expects the response. In this case SSL_read will simply block since the connection is still open but no new data arrive. Because of this a web server actually needs to parse what is received according to the HTTP specification to determine when the request is finished (i.e. implement message semantics on top of the byte stream) - and then exit the loop even if the connection is not closed. This is not specific to HTTPS but is the same with plain HTTP, only that this would use recv instead of SSL_read.