Search code examples

Performance penalty of using clone_from_slice() instead of copy_from_slice()?

In Rust, there are two methods to update the content of a slice from another slice: clone_from_slice() and copy_from_slice(). The behavior of these two functions are unsurprising - the first does a clone and expects the type to implement Clone, while the second does a copy and expects the type to implement Copy.

However, it surprises me that the documentation for clone_from_slice says this: "If T implements Copy, it can be more performant to use copy_from_slice." It is surprising that there should be a performance difference here. If T implements Copy, then .clone() is required to be equivalent to copying bits; however since the compiler knows what type T is, it should be able to figure out if it can do a bitwise copy even if I use clone_from_slice.

So where does the performance inefficiency arise from?


  • TL;DR Please check the source of clone_from_slice, it is visiting all the elements of slice and calling clone for each, while copy_from_slice directly copies all the bits with memcpy.

    Note : With Rust version 1.52.0, clone_from_slice implemented via specialization, if you'd call clone_from_slice with Copy types it will call copy_from_slice internally. (reference)

    If T implements Copy, then .clone() is required to be equivalent to copying bits

    Even if every Copy type would implement Clone by default where clone directly use the copy; clone_from_slice will still traverse the slice and do the copy while traversing.

    But no this proposition is correct for primitives but not correct for the cases like below:

    struct X;
    impl Clone for X {
        fn clone(&self) -> Self {
            //do some heavy operation or light(depends on the logic)

    While Clone can be implemented by any logic Copy types will simply copy bits when duplicating an object.

    If T implements Copy, it can be more performant to use copy_from_slice

    Important thing is in here, the documentation says "it can be" not "it will be", this brings possibilities like

    • Clone implementation can directly use Copy implementation. For the basic types like primitives, optimizer may directly use memcpy instead of traversing, then we might accept this proposition as wrong because one will not be performant then other.

    • Clone implementation can directly use Copy implementation. For complex types(the traversing issue above) makes this proposition correct. (I've edit the example from @kmdreko with a bit more complex structure, please check the result from godbolt)

    • Clone implementation is custom and it is a Copy type, this one will make this proposition correct even custom implementation is inexpensive then copy for the large slices using memcpy might be more beneficial.