I`ve found the following implementation for the memcpy (interview question,where the iteration count ~ size/4):
void memcpy(void* dest, void* src, int size)
{
uint8_t *pdest = (uint8_t*) dest;
uint8_t *psrc = (uint8_t*) src;
int loops = (size / sizeof(uint32_t));
for(int index = 0; index < loops; ++index)
{
*((uint32_t*)pdest) = *((uint32_t*)psrc);
pdest += sizeof(uint32_t);
psrc += sizeof(uint32_t);
}
loops = (size % sizeof(uint32_t));
for (int index = 0; index < loops; ++index)
{
*pdest = *psrc;
++pdest;
++psrc;
}
}
And I am not sure I understand it .....:
1) Why to define uint8_t *pdest,uint8_t *psrc
and after that to make casting to uint32_t
-
*((uint32_t*)pdest) = *((uint32_t*)psrc);
I think that from the beginning pdest
and psrc
should be defined as uint32_t...What I am missing?
2)It looks to me that there is a problem with this implementation:
if src = 0x100
and dst = 0x104
and the src (originally)looks like that :
-------------------------
| 6 | 8 | 7 | 1 |
-------------------------
0x100 0x104 0x108 0x1C0
after the execution it will be like that
-------------------------
| 6 | 6 | 6 | 6 |.....
-------------------------
0x100 0x104 0x108 0x1C0
despite that looks the following memory layout should be a result
-------------------------
| 6 | 6 | 8 | 7 |....
-------------------------
0x100 0x104 0x108 0x1C0
This memcpy()
suffers another problem: what happens if one or both buffers are not on a proper boundary? This could affect performance significantly or, on some architectures, make code not even run. Another common problem (but not here) is dealing with buffers whose length is not a multiple of the width of the native (uint32) type. The reason your example is using a uint8 type (and then casting as needed) is to allow trailing bytes to be copied without casting. It makes no difference if you cast the bulk of the transfer or just the trailing bytes. To account for buffer alignment, you would expect something early to copy initial non-aligned data until an alignment is established.
The memcpy()
function is not guaranteed to work in a defined manner when the source and destination overlap; for that reason the problem you label as number two is not a problem. If instead of memcpy()
, this code were used in an implementation of memmove()
, then the problem would be real.