Search code examples
calgorithmbuffermemcpy

memcpy correct implementation approach


I`ve found the following implementation for the memcpy (interview question,where the iteration count ~ size/4):

void memcpy(void* dest, void* src, int size)
{
    uint8_t *pdest = (uint8_t*) dest;
    uint8_t *psrc = (uint8_t*) src;

    int loops = (size / sizeof(uint32_t));
    for(int index = 0; index < loops; ++index)
    {
        *((uint32_t*)pdest) = *((uint32_t*)psrc);
        pdest += sizeof(uint32_t);
        psrc += sizeof(uint32_t);
    }

    loops = (size % sizeof(uint32_t));
    for (int index = 0; index < loops; ++index)
    {
        *pdest = *psrc;
        ++pdest;
        ++psrc;
    }
}

And I am not sure I understand it .....:

1) Why to define uint8_t *pdest,uint8_t *psrc and after that to make casting to uint32_t -

*((uint32_t*)pdest) = *((uint32_t*)psrc);

I think that from the beginning pdest and psrc should be defined as uint32_t...What I am missing? 2)It looks to me that there is a problem with this implementation: if src = 0x100 and dst = 0x104 and the src (originally)looks like that :

-------------------------
|  6  |  8  |  7  |  1  |
-------------------------    
0x100  0x104 0x108 0x1C0

after the execution it will be like that

-------------------------
|  6  |  6  |  6  |  6  |.....
-------------------------
0x100  0x104 0x108 0x1C0

despite that looks the following memory layout should be a result

-------------------------
|  6  |  6  |  8  |  7  |....
-------------------------
0x100  0x104 0x108 0x1C0

Solution

  • This memcpy() suffers another problem: what happens if one or both buffers are not on a proper boundary? This could affect performance significantly or, on some architectures, make code not even run. Another common problem (but not here) is dealing with buffers whose length is not a multiple of the width of the native (uint32) type. The reason your example is using a uint8 type (and then casting as needed) is to allow trailing bytes to be copied without casting. It makes no difference if you cast the bulk of the transfer or just the trailing bytes. To account for buffer alignment, you would expect something early to copy initial non-aligned data until an alignment is established.

    The memcpy() function is not guaranteed to work in a defined manner when the source and destination overlap; for that reason the problem you label as number two is not a problem. If instead of memcpy(), this code were used in an implementation of memmove(), then the problem would be real.