With regard to the issues copy vs. memcpy vs memmove(excellent info here, btw.), I have been reading up and it would seem to me, that unlike what is colloquially said, for example at cppreferenceNote: memcpy has been changed to memmove since taking this quote. --
Notes
In practice, implementations of
std::copy
avoid multiple assignments and use bulk copy functions such asstd::memcpy
if the value type isTriviallyCopyable
-- std::copy
(nor std::copy_backward
) cannot be implemented in terms of memcopy
, because for std::copy
only the beginning of the destination range must not fall into the source range, but for memcpy
the entirety of the ranges must not overlap.
Looking at Visual-C++'s implementation (see the xutility
header), we can also observe that VC++ uses memmove, but that one now has more relaxed requirements than std::copy
:
... The objects may overlap: copying takes place as if the characters were copied to a temporary character array and then the characters were copied from the array ...
So it would appear that implementing std::copy
in terms of memcpy
is not possible, but using memmove
is actually a pessimization. (a wee tiny bit of pessimization, possibly not measurable, but still)
To come back to the question(s): Is my summary correct? Is this a problem anywhere? Regardless of what's specified, is there even a possible practical implementation of memcpy
that would not also fulfill the requirements of std::copy
, i.e. are there memcpy
implementations that break when the ranges partially overlap as allowed by std::copy
?
If the question is, whether it's possible to encounter an efficient memcpy implementation with enough undefined behavior to not trust it over overlapping ranges, then the answer is yes. :-)
Consider one possible implementation of memcpy on Power(PC) architecture: lmw instruction will load multiple consecutive words from memory into consecutive registers (which can be specified as a user defined range argument). stmw will then save the supplied register range back to memory. Thus, we are talking around ~100/200 bytes (32b/64b CPU) buffered by the CPU during a single memcpy iteration - plenty of data to spoil the target range if it overlaps with the source one, especially considering that CPU makes no promises about relative order of individual load and stores.