What is the best way to determine packet size with recv()?

Extremely new to socket programming and C in general. I am trying to write a basic program to send and receive data between two machines. I understand that recv will not get all your data at once -- you essentially have to loop it until it has read the whole message.

In lieu of just setting a limit on both machines, I have created a simple Message struct on the client side:

struct Message {
    size_t length;
    char contents[1024 - sizeof(size_t)];
} message; 
message.length = sizeof(struct Message);
message.contents = information_i_want_to_send;

When it arrives at the server, I have recv read into a buffer: received = recv(ioSock, &buffer, 1024, 0) (Which coincidentally is the same size as my Message struct -- but assuming it wasn't...).

I then extract Message.length from the buffer like this:

size_t messagelength;
messagelength = *((size_t *) &buffer);

Then I loop recv into the buffer while received < messagelength. This works, but I can't help feeling it's really ugly and it feels hacky. (Especially if the first recv call reads less than sizeof(size_t) or the machines are different bit architectures, in which case the size_t cast won't work..). Is there a better way to do this?

Solution

You have a fixed-size message, so you can use something like this:

#include <errno.h>
#include <limits.h>

// Returns the number of bytes read.
// EOF was reached if the number of bytes read is less than requested.
// On error, returns -1 and sets errno.
ssize_t recv_fixed_amount(int sockfd, char *buf, size_t size) {
   if (size > SSIZE_MAX) {
      errno = EINVAL;
      return -1;
   }

   ssize_t bytes_read = 0;
   while (size > 0) {
      ssize_t rv = recv(sockfd, buf, size, 0); 
      if (rv < 0)
         return -1;
      if (rv == 0)
         return bytes_read;

      size -= rv;
      bytes_read += rv;
      buf += rv;
   }

   return bytes_read;
}

It would be used something like this:

typedef struct {
   uint32_t length;
   char contents[1020];
} Message;

Message message;

ssize_t bytes_read = recv_fixed_amount(sockfd, &(message.length), sizeof(message.length));
if (bytes_read == 0) {
   printf("EOF reached\n");
   exit(EXIT_SUCCESS);
}

if (bytes_read < 0) {
   perror("recv");
   exit(EXIT_FAILURE);
}

if (bytes_read != sizeof(message.length)) {
   fprintf(stderr, "recv: Premature EOF.\n");
   exit(EXIT_FAILURE);
}

bytes_read = recv_fixed_amount(sockfd, &(message.content), sizeof(message.content));
if (bytes_read < 0) {
   perror("recv");
   exit(EXIT_FAILURE);
}

if (bytes_read != msg_size) {
   fprintf(stderr, "recv: Premature EOF.\n");
   exit(EXIT_FAILURE);
}

Notes:

size_t is not going to be the same everywhere, so I switched to a uint32_t.
I read the fields independently because the padding within the struct can vary between implementations. They would need to be sent that way as well.
The receiver is populating message.length with the information from the stream, but doesn't actually use it.
A malicious or buggy sender could provide a value for message.length that's too large and crash the receiver (or worse) if it doesn't validate it. Same goes for contents. It might not be NUL-terminated if that's expected.

But what if the length wasn't fixed? Then the sender would need to somehow communicate how much the reader needs to read. A common approach is a length prefix.

typedef struct {
   uint32_t length;
   char contents[];
} Message;

uint32_t contents_size;
ssize_t bytes_read = recv_fixed_amount(sockfd, &contents_size, sizeof(contents_size));
if (bytes_read == 0) {
   printf("EOF reached\n");
   exit(EXIT_SUCCESS);
}

if (bytes_read < 0) {
   perror("recv");
   exit(EXIT_FAILURE);
}

if (bytes_read != sizeof(contents_size)) {
   fprintf(stderr, "recv: Premature EOF.\n");
   exit(EXIT_FAILURE);
}

Message *message = malloc(sizeof(Message)+contents_size);
if (!message) {
   perror("malloc");
   exit(EXIT_FAILURE);
}

message->length = contents_size;

bytes_read = recv_fixed_amount(sockfd, &(message->contents), contents_size);
if (bytes_read < 0) {
   perror("recv");
   exit(EXIT_FAILURE);
}

if (bytes_read != contents_size) {
   fprintf(stderr, "recv: Premature EOF.\n");
   exit(EXIT_FAILURE);
}

Notes:

message->length contains the size of message->contents instead of the size of the structure. This is far more useful.

Another approach is to use a sentinel value. This is a value that tells the reader the message is over. This is what the NUL that terminates C strings is. This is more complicated because you don't know how much to read in advance. Reading byte-by-byte is too expensive, so one normally uses a buffer.

 while (1) {
     extend_buffer_if_necessary();
     recv_into_buffer();
     while (buffer_contains_a_sentinel()) {
        // This also shifts the remainder of the buffer's contents.
        extract_contents_of_buffer_up_to_sentinel();
        process_extracted_message();      
     }
 }

The advantage of using a sentinel value is that one doesn't need to know the length of the message in advance (so the sender can start sending it before it's fully created.)

The disadvantage is the same as for C strings: The message can't contain the sentinel value unless some form of escaping mechanism is used. Between this and the complexity of the reader, you can see why a length prefix is usually preferred over a sentinel value. :)

Finally, there's a better solution than sentinel values for large messages that you want to start sending before they are fully created: A sequence of length-prefixed chunks. One keeps reading chunks until a chunk of size 0 is encountered, signaling the end.

HTTP supports both length-prefixed messages (in the form of Content-Length: <length> header) and this approach (in the form of the Transfer-Encoding: chunked header).