Search code examples
c++socketstcp

C++ TCP recv unknown buffer size


I want to use the function recv(socket, buf, len, flags) to receive an incoming packet. However I do not know the length of this packet prior to runtime so the first 8 bytes are supposed to tell me the length of this packet. I don't want to just allocate an arbitrarily large len to accomplish this so is it possible to set len = 8 have buf be a type of uint64_t. Then afterwards

memcpy(dest, &buf, buf)?


Solution

  • Since TCP is stream-based, I'm not sure what type of packages you mean. I will assume that you are referring to application level packages. I mean packages which are defined by your application and not by underlying protocols like TCP. I will call them messages instead to avoid confusion.

    I will show two possibilities. First I will show, how you could read a message without knowing the length before you have finished reading. The second example will do two calls. First it reads the size of the message. Then it read the whole message at once.


    Read data until the message is complete

    Since TCP is stream-based, you will not loss any data when your buffer is not big enough. So you can read a fixed amount of bytes. If something is missing, you can call recv again. Here is a extensive example. I just wrote it without testing. I hope everything would work.

    std::size_t offset = 0;
    std::vector<char> buf(512);
    
    std::vector<char> readMessage() {
        while (true) {
            ssize_t ret = recv(fd, buf.data() + offset, buf.size() - offset, 0);
            if (ret < 0) {
                if (errno == EINTR) {
                    // Interrupted, just try again ...
                    continue;
                } else {
                    // Error occured. Throw exception.
                    throw IOException(strerror(errno));
                }
            } else if (ret == 0) {
                // No data available anymore.
                if (offset == 0) {
                    // Client did just close the connection
                    return std::vector<char>(); // return empty vector
                } else {
                    // Client did close connection while sending package?
                    // It is not a clean shutdown. Throw exception.
                    throw ProtocolException("Unexpected end of stream");
                }
            } else if (isMessageComplete(buf)) {
                // Message is complete.
                buf.resize(offset + ret); // Truncate buffer
                std::vector<char> msg = std::move(buf);
                std::size_t msgLen = getSizeOfMessage(msg);
                if (msg.size() > msgLen) {
                    // msg already contains the beginning of the next message.
                    // write it back to buf
                    buf.resize(msg.size() - msgLen)
                    std::memcpy(buf.data(), msg.data() + msgLen, msg.size() - msgLen);
                    msg.resize(msgLen);
                }
                buf.resize(std::max(2*buf.size(), 512)) // prepare buffer for next message
                return msg;
            } else {
                // Message is not complete right now. Read more...
                offset += ret;
                buf.resize(std::max(buf.size(), 2 * offset)); // double available memory
            }
        }
    }
    

    You have to define bool isMessageComplete(std::vector<char>) and std::size_t getSizeOfMessage(std::vector<char>) by yourself.

    Read the header and check the length of the package

    The second possibility is to read the header first. Just the 8 bytes which contains the size of the package in your case. After that, you know the size of the package. This mean you can allocate enough storage and read the whole message at once:

    /// Reads n bytes from fd.
    bool readNBytes(int fd, void *buf, std::size_t n) {
        std::size_t offset = 0;
        char *cbuf = reinterpret_cast<char*>(buf);
        while (true) {
            ssize_t ret = recv(fd, cbuf + offset, n - offset, MSG_WAITALL);
            if (ret < 0) {
                if (errno != EINTR) {
                    // Error occurred
                    throw IOException(strerror(errno));
                }
            } else if (ret == 0) {
                // No data available anymore
                if (offset == 0) return false;
                else             throw ProtocolException("Unexpected end of stream");
            } else if (offset + ret == n) {
                // All n bytes read
                return true;
            } else {
                offset += ret;
            }
        }
    }
    
    /// Reads message from fd
    std::vector<char> readMessage(int fd) {
        std::uint64_t size;
        if (readNBytes(fd, &size, sizeof(size))) {
            std::vector buf(size);
            if (readNBytes(fd, buf.data(), size)) {
                return buf;
            } else {
                throw ProtocolException("Unexpected end of stream");
            }
        } else {
            // connection was closed
            return std::vector<char>();
        }
    }
    

    The flag MSG_WAITALL requests that the function blocks until the full amount of data is available. However, you cannot rely on that. You have to check it and read again if something is missing. Just like I did it above.

    readNBytes(fd, buf, n) reads n bytes. As far as the connection was not closed from the other side, the function will not return without reading n bytes. If the connection was closed by the other side, the function returns false. If the connection was closed in the middle of a message, an exception is thrown. If an i/o-error occurred, another exception is thrown.

    readMessage reads 8 bytes [sizeof(std::unit64_t)] und use them as size for the next message. Then it reads the message.

    If you want to have platform independency, you should convert size to a defined byte order. Computers (with x86 architecture) are using little endian. It is common to use big endian in network traffic.

    Note: With MSG_PEEK it is possible to implement this functionality for UDP. You can request the header while using this flag. Then you can allocate enough space for the whole package.