Search code examples
c++structmemcpy

Does memcpy preserve data between different types?


Does calling memcpy on two different structures preserve the original data if the buffer size is sufficient? And is it defined to retrieve values of another data type with data of previous data type if their respective data types overlap?

This should be similar for both c/cpp languages but I'm providing an example in cpp -

#include <iostream>
#include <cstring>

using namespace std;

struct A{
  int a;
  char b[10];
};

struct B{
  int ba;
  int bb;
};

int main(){
    B tmp;
    tmp.ba = 50;
    tmp.bb = 24;
    cout << tmp.ba << tmp.bb << "\n";

    // everything is fine yet

    A obj;
    memcpy(&obj, &tmp, sizeof(tmp));

    // 1. is this valid?
    cout << obj.a << "\n";

    B newB;
    memcpy(&newB, &obj, sizeof(newB));

    // 2. Are these valid?
    cout << newB.ba << newB.bb << "\n";
}

In above example I've commented 1st and 2nd comment, are they valid and is data preserved if sufficient buffer area is provided? Can we do this portably?

The structure and other functions related to it are in C library but we'll use and compile it with c++.


Solution

  • The C++ Standard doesn't specify the behaviour of memcpy, other than deferring to the C Standard. (Perhaps to avoid tackling issues like this!). In the C Standard it's defined as being equivalent to a sequence of character type copies1.

    So it seems reasonable to treat memcpy(&obj, &tmp, sizeof(tmp)); as:

    unsigned char *dst = (char *)&obj;
    unsigned char *src = (char *)&tmp;
    for (size_t i = 0; i != sizeof tmp; ++i)
        dst[i] = src[i];
    

    and then use the C++ Standard to cover that code.

    The issues now are:

    1. Does &tmp, &obj actually give the address of the start of the object?
    2. What about padding bytes in obj?
    3. What about uninitialized padding bytes in tmp?
    4. What happens to the values of sub-objects of obj?

    Issue 1: Yes, this is covered by [class.mem]/19, since there are no base class subobjects (and it doesn't overload operator&).

    Issue 2: I cannot find any text specifically covering this; but the example in the standard of copying a class-type object into a char buffer and back into the object, would not work if it were not permitted to write padding bytes.

    Issue 3: In [dcl.init]/12 is some text that explicitly permits the use of the above code for uninitialized data; and the destination will contain indeterminate values. So if uninitialized padding bytes in the source are only mapped to uninitialized padding bytes in the destination, it's fine. But if they are mapped to sub-objects in the destination, then those objects will have indeterminate value.

    Issue 4: There's no problem here, the strict aliasing rule permits objects to have some (or all) of their bytes overwritten through a character type expression. Accessing the object later will yield the value corresponding to the representation, with UB if it doesn't represent a value.

    So, all in all, I think your specific example is OK, assuming sizeof(A) >= sizeof(B).


    1 In C, memcpy also preserves the effective type of the object. C++ has a different object model and there is no equivalent of this. So if you used similar code with a C compiler, you'd also need to observe the strict aliasing rule between the types in both objects.