I'm experimenting with memory handling in C.
Giving the following code
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef unsigned char BYTE;
typedef struct Data {
int valid;
double value;
} Data;
typedef struct Message {
int id;
int size;
int nr;
Data *data;
} Message;
int main() {
int sz = 5;
int id = 1;
int i;
Message msg;
msg.id = id;
msg.size = 0;
msg.nr = sz;
msg.data = malloc(sizeof(Data) * msg.nr);
for (i = 0; i < msg.nr; i++) {
msg.data[i].valid = 1;
msg.data[i].value = (double)i;
}
printf("Input data\nid: %d\nsize: %d\nnr: %d\n",
msg.id, msg.size, msg.nr);
for (i = 0; i < sz; i++)
printf("msg.data[%d].valid: %d\nmsg.data[%d].value: %lf\n",
i, msg.data[i].valid, i, msg.data[i].value);
int bufferSize = sizeof(msg) + (sizeof(Data) * msg.nr);
msg.size = bufferSize;
printf("bufferSize: %d\n", bufferSize);
BYTE *buffer = malloc(sizeof(BYTE) * bufferSize);
memcpy(buffer, &msg, bufferSize);
if (msg.data != NULL)
free(msg.data);
// test
Message *p = (Message *)buffer;
Message rcv;
rcv.id = 0;
rcv.size = 0;
rcv.nr = 0;
rcv.data = malloc(sizeof(Data) * p->nr);
memcpy(&rcv, buffer, p->size);
printf("Output data\nid: %d\nsize: %d\nnr: %d\n",
rcv.id, rcv.size, rcv.nr);
for (i = 0; i < sz; i++)
printf("rcv.data[%d].valid: %d\nrcv.data[%d].value: %lf\n",
i, rcv.data[i].valid, i, rcv.data[i].value);
if (rcv.data != NULL)
free(rcv.data);
if (buffer != NULL)
free(buffer);
}
I'm obtaining the following error at the end of the execution of the code
*** stack smashing detected ***: terminated
Going deeper what I found is that msg.data
and rcv.data
point to the same memory address
and when I free the rcv.data
basically I'm releasing a memory location that as been released before.
I read that memcpy
should create a copy, but I had a different experience.
I don't understand why it's happening.
I use gcc as compiler, and I tried to run the code in different machines, but I obtain the same result always. Why this behavior?
Why the code contains undefined behaviors
The first call to memcpy
reads starting from the address of msg
for bufferSize
bytes, where bufferSize
is larger than the size of msg
itself. Reading beyond the size of the object causes undefined behaviors.
int bufferSize = sizeof(msg) + (sizeof(Data) * msg.nr);
// This reads beyond the size of `msg`
memcpy(buffer, &msg, bufferSize);
The memcpy
function copies the data byte-by-byte, including the Data* data
field of Message
, which is why you're getting the same address for both msg.data
and rcv.data
.
memcpy
does not attempt to create a copy of the array pointed to by the data
pointer. Copying bytes beyond the end of msg
also doesn't help, because msg
(which is on the stack) is not going to be adjacent with the array pointed to by msg.data
(which is on the heap).
Solution for arbitrary length array
To fix this, call memcpy
two times separately to serialize the Message
and the array of Data
.
int bufferSize = sizeof(Message) + sizeof(Data) * msg.nr;
BYTE* buffer = malloc(sizeof(BYTE) * bufferSize);
memcpy(buffer, &msg, sizeof(Message));
memcpy(buffer + sizeof(Message), msg.data, sizeof(Data) * msg.nr);
Similarly, on the receiver end, call memcpy
two times to for the Message
and the array of Data
.
Message rcv;
memcpy(&rcv, buffer, sizeof(Message));
rcv.data = malloc(sizeof(Data) * rcv.nr);
memcpy(rcv.data, buffer + sizeof(Message), sizeof(Data) * rcv.nr);
Try it interactively on godbolt
Solution for size-limited array
If it's acceptable to have the data
array to have a fixed maximum size NUM_DATA_MAX
, then (de)serialization can be done with 1, instead of 2, calls to memcpy
. To do so, define Message
to contain an array data
, instead of a pointer to an array.
#define NUM_DATA_MAX 10
typedef struct Data {
int valid;
double value;
} Data;
typedef struct Message {
int id;
int size;
int nr;
Data data[NUM_DATA_MAX];
} Message;
On the sender end
int bufferSize = sizeof(Message);
BYTE* buffer = malloc(bufferSize);
memcpy(buffer, &msg, sizeof(Message));
On the receiver end
Message rcv;
memcpy(&rcv, buffer, sizeof(Message));