Search code examples
cgccmemcpy

Does memcpy really a copy of the memory?


I'm experimenting with memory handling in C.

Giving the following code

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef unsigned char BYTE;

typedef struct Data {
    int valid;
    double value;
} Data;

typedef struct Message {
    int id;
    int size;
    int nr;
    Data *data;
} Message;

int main() {
    int sz = 5;
    int id = 1;
    int i;

    Message msg;
    msg.id = id;
    msg.size = 0;
    msg.nr = sz;
    msg.data = malloc(sizeof(Data) * msg.nr);
    for (i = 0; i < msg.nr; i++) {
        msg.data[i].valid = 1;
        msg.data[i].value = (double)i;
    }

    printf("Input data\nid: %d\nsize: %d\nnr: %d\n",
           msg.id, msg.size, msg.nr);
    
    for (i = 0; i < sz; i++)
        printf("msg.data[%d].valid: %d\nmsg.data[%d].value: %lf\n",
               i, msg.data[i].valid, i, msg.data[i].value);

    int bufferSize = sizeof(msg) + (sizeof(Data) * msg.nr);
    msg.size = bufferSize;
    printf("bufferSize: %d\n", bufferSize);

    BYTE *buffer = malloc(sizeof(BYTE) * bufferSize);
    memcpy(buffer, &msg, bufferSize);

    if (msg.data != NULL)
        free(msg.data);
     
    // test
    Message *p = (Message *)buffer;
    Message rcv;
    rcv.id = 0;
    rcv.size = 0;
    rcv.nr = 0;
    rcv.data = malloc(sizeof(Data) * p->nr);

    memcpy(&rcv, buffer, p->size);
    
    printf("Output data\nid: %d\nsize: %d\nnr: %d\n",
           rcv.id, rcv.size, rcv.nr);
    
    for (i = 0; i < sz; i++)
        printf("rcv.data[%d].valid: %d\nrcv.data[%d].value: %lf\n",
               i, rcv.data[i].valid, i, rcv.data[i].value);

    if (rcv.data != NULL)
        free(rcv.data);

    if (buffer != NULL)
        free(buffer);
}

I'm obtaining the following error at the end of the execution of the code *** stack smashing detected ***: terminated

Going deeper what I found is that msg.data and rcv.data point to the same memory address

memory location

and when I free the rcv.data basically I'm releasing a memory location that as been released before.

I read that memcpy should create a copy, but I had a different experience. I don't understand why it's happening.

I use gcc as compiler, and I tried to run the code in different machines, but I obtain the same result always. Why this behavior?


Solution

  • Why the code contains undefined behaviors

    The first call to memcpy reads starting from the address of msg for bufferSize bytes, where bufferSize is larger than the size of msg itself. Reading beyond the size of the object causes undefined behaviors.

    int bufferSize = sizeof(msg) + (sizeof(Data) * msg.nr);
    // This reads beyond the size of `msg`
    memcpy(buffer, &msg, bufferSize);
    

    The memcpy function copies the data byte-by-byte, including the Data* data field of Message, which is why you're getting the same address for both msg.data and rcv.data. memcpy does not attempt to create a copy of the array pointed to by the data pointer. Copying bytes beyond the end of msg also doesn't help, because msg (which is on the stack) is not going to be adjacent with the array pointed to by msg.data (which is on the heap).

    memcpy entry on cppreference

    Solution for arbitrary length array

    To fix this, call memcpy two times separately to serialize the Message and the array of Data.

    int bufferSize = sizeof(Message) + sizeof(Data) * msg.nr;
    BYTE* buffer = malloc(sizeof(BYTE) * bufferSize);
    memcpy(buffer, &msg, sizeof(Message));
    memcpy(buffer + sizeof(Message), msg.data, sizeof(Data) * msg.nr);
    

    Similarly, on the receiver end, call memcpy two times to for the Message and the array of Data.

    Message rcv;
    memcpy(&rcv, buffer, sizeof(Message));
    rcv.data = malloc(sizeof(Data) * rcv.nr);
    memcpy(rcv.data, buffer + sizeof(Message), sizeof(Data) * rcv.nr);
    

    Try it interactively on godbolt

    Solution for size-limited array

    If it's acceptable to have the data array to have a fixed maximum size NUM_DATA_MAX, then (de)serialization can be done with 1, instead of 2, calls to memcpy. To do so, define Message to contain an array data, instead of a pointer to an array.

    #define NUM_DATA_MAX 10
    
    typedef struct Data {
        int valid;
        double value;
    } Data;
    
    typedef struct Message {
        int id;
        int size;
        int nr;
        Data data[NUM_DATA_MAX];
    } Message;
    

    On the sender end

    int bufferSize = sizeof(Message);
    BYTE* buffer = malloc(bufferSize);
    memcpy(buffer, &msg, sizeof(Message));
    

    On the receiver end

    Message rcv;
    memcpy(&rcv, buffer, sizeof(Message));