Search code examples
cstructmemcpyportability

Reliably and portably store and retrieve objects of structure type in C


@bdonlan,in Copying structure in C with assignment instead of memcpy(), lists several reasons for using memcpy to copy objects of structure type. I have one more reason: I want to use the same area of memory to store and retrieve arbitrary objects—of possibly different structure type—at different times (like storage on a pre-allocated heap).

I want to know:

  • how this can be done portably (in the sense that the behavior defined by the Standard) and
  • what parts of the Standard allow me to reasonably assume that it can be done portably.

Here is an MRE (sorta: not so much on the "M" [minimal] and I'm basically asking about the "R" [reproducible]):

Edit: I hope to have placed a better example after this one. I'm leaving this one here so as to provide a reference for the answers and comments thus far.

// FILE: memcpy_struct.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// EDIT: @john-bollinger POINTS OUT THAT THE FOLLOWING LINE
//  IS NOT PORTABLE.
// typedef struct { } structure ;
// INSTEAD:
typedef struct { char dummy ; } structure ;

typedef struct {
    unsigned long long u ; unsigned long long v ;
} unsignedLongLong2; // TWICE AS MANY BITS AS long long

typedef struct
{
    unsigned long long u ; unsigned long long v ;
    unsigned long long w ; unsigned long long x ;
} unsignedLongLong4; // FOUR TIMES AS MANY BITS AS long long

typedef unsigned char byte ;

void store ( byte * target , const structure * source , size_t size ) {
    memcpy ( target , source , size ) ;
}

void fetch ( structure * target , const byte * source , size_t size ) {
    memcpy ( target , source , size ) ;
}

const size_t enough =
    sizeof ( unsignedLongLong2 ) < sizeof ( unsignedLongLong4 )
    ? sizeof ( unsignedLongLong4 ) : sizeof ( unsignedLongLong2 ) ;

int main ( void )
{
    byte * memory = malloc ( enough ) ;
    unsignedLongLong2 v0 = { 0xabacadabaabacada , 0xbaabacadabaabaca } ;
    unsignedLongLong4 w0= {
        0xabacadabaabacada , 0xbaabacadabaabaca ,
        0xdabaabacadabaaba , 0xcadabaabacadabaa } ;
    unsignedLongLong2 v1 ;
    unsignedLongLong4 w1 ;
    store ( memory ,   ( structure * ) & v0 ,   sizeof v0 ) ;
    fetch ( ( structure * ) & v1 ,   memory ,   sizeof v1 ) ;
    store ( memory ,   ( structure * ) & w0 ,   sizeof w0 ) ;
    fetch ( ( structure * ) & w1 ,   memory ,   sizeof w1 ) ;
    char s [ 1 + sizeof w0 * CHAR_BIT ] ; // ENOUGH FOR TERMINATING NULL CHAR-
    char t [ 1 + sizeof w0 * CHAR_BIT ] ; // ACTERS + BASE-2 REPRESENTATION.
    sprintf ( s, "%llx-%llx",  v0 . u,  v0 . v ) ;
    sprintf ( t, "%llx-%llx",  v1 . u,  v1 . v ) ;
    puts ( s ) ;   puts ( t ) ;
    puts ( strcmp ( s , t ) ? "UNEQUAL" : "EQUAL" ) ;
    sprintf ( s, "%llx-%llx-%llx-%llx",  w0 . u,  w0 . v,  w0 . w,  w0 . x ) ;
    sprintf ( t, "%llx-%llx-%llx-%llx",  w1 . u,  w1 . v,  w1 . w,  w1 . x ) ;
    puts ( s ) ;   puts ( t ) ;
    puts ( strcmp ( s , t ) ? "UNEQUAL" : "EQUAL" ) ;
    free ( memory ) ;
}

Compiled with

gcc -std=c11 memcpy_struct.c # can do C99 or C17, too

Output of corresponding executable

abacadabaabacada-baabacadabaabaca
abacadabaabacada-baabacadabaabaca
EQUAL
abacadabaabacada-baabacadabaabaca-dabaabacadabaaba-cadabaabacadabaa
abacadabaabacada-baabacadabaabaca-dabaabacadabaaba-cadabaabacadabaa
EQUAL

But what guarantees that the pairs of outputs will always be EQUAL, provided that the Standard is respected? I think the following helps (N2176 Types 6.2.5-28):

All pointers to structure types shall have the same representation and alignment requirements as each other.

Edit: After considering the answers and comments, I think the following is a better MRE:

// FILE: memcpy_struct-1.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct {
    size_t length ;
} array_header ;

typedef struct
{
    size_t capacity ;
    size_t length ;
} buffer_header ;

const size_t hsize_max =
    sizeof ( array_header ) < sizeof ( buffer_header )
    ? sizeof ( buffer_header ) : sizeof ( array_header ) ;

const size_t block = 512u ;

const size_t pageSize = block * ( 1 +
    ( hsize_max / block + ! ! hsize_max % block ) ) ;

int main ( void )
{
    void * memory = malloc ( pageSize ) ;
    array_header a0 = { 42u } ;
    buffer_header b0 = { 42u , 0u } ;
    array_header a1 ;
    buffer_header b1 ;
    memcpy ( memory ,     & a0 , sizeof a0 ) ;
    memcpy (   & a1 ,   memory , sizeof a1 ) ;
    memcpy ( memory ,     & b0 , sizeof b0 ) ;
    memcpy (   & b1 ,   memory , sizeof b1 ) ;
    fputs ( "array_header-s are " , stdout ) ;
    puts ( a0.length == a1.length ? "EQUAL" : "UNEQUAL" ) ;
    fputs ( "buffer_header-s are " , stdout ) ;
    puts ( b0.capacity == b1.capacity && b0.length == b1.length
        ? "EQUAL" : "UNEQUAL" ) ;
    free ( memory ) ;
}

Solution

  • Since you are asking about portability and the provisions of the standard, the very first thing that came to mind was that structure types without any members, such as this ...

    typedef struct { } structure ;
    

    ... are a non-portable extension. Your objective there seems to be to use structure * as a generic pointer-to-structure type, but you don't need that when you have void * available as a generic pointer-to-anything type. And with void *, you even get the pointer conversions automatically, without the explicit casts. Note also that you eventually get the conversions to void * anyway when you call memcpy().

    I want to use the same area of memory to store and retrieve arbitrary objects—of possibly different structure type—at different times (like storage on a pre-allocated heap).

    Ok. That's not a particularly big ask.

    I want to know:

    • how this can be done portably (in the sense that the behavior defined by the Standard) and

    Your example is fine. Alternatively, if you know in advance all the different structure types that you may want to store, then you can use a union.

    • what parts of the Standard allow me to reasonably assume that it can be done portably.

    With your dynamic allocation / memcpy() example, there is

    • C17 7.22.3.4/2: "The malloc function allocates space for an object whose size is specified by size"

    • C17 6.2.4/2: "An object exists, has a constant address, and retains its last-stored value throughout its lifetime."

    • C17 7.22.3/1: "The lifetime of an allocated object extends from the allocation until the deallocation."

    • C17 7.24.2.1/3: "The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1."

    Thus, in a program exhibiting only defined behavior, memcpy() faithfully copies all the specified bytes from the source object's representation to the destination object's representation. That object retains them unchanged until and unless either they are overwritten or the end of its lifetime. That keeps them available for the second memcpy() to copy them from there to some other object. Neither memcpy alters the byte sequence, and the allocated object faithfully keeps them in between, so in the end, all three objects -- the original, the allocated, and the final destination, must contain the same byte sequence, up to the number of bytes copied.