By examining the implementation of the vector
class in the GCC headers (stl_vector.h
), I found the following two member functions inside the vector base implementation class (Which I renamed in the example as Data_Class
to avoid long_and_weird name).
void
_M_copy_data(Data_Class const& __x) _GLIBCXX_NOEXCEPT
{
_M_start = __x._M_start;
_M_finish = __x._M_finish;
_M_end_of_storage = __x._M_end_of_storage;
}
void
_M_swap_data(Data_Class& __x) _GLIBCXX_NOEXCEPT
{
// Do not use std::swap(_M_start, __x._M_start), etc as it loses
// information used by TBAA.
Data_Class __tmp;
__tmp._M_copy_data(*this);
_M_copy_data(__x);
__x._M_copy_data(__tmp);
}
The members (_M_start
, _M_finish
, _M_end_of_storage
) are just pointers of some type and are the only members in the class.
The question here is what is the reason for the comment in the second function?
It may be for the same reason, but why not use std::swap
for two Data_Class
objects or maybe at least an automatically generated operator=
to do the copying?
I feel like this could be done simpler.
For a std::vector
object, the three contained pointers will never address the same memory as the pointers in another std::vector
- each manages its own contiguous memory region. This insight can be used for type-based aliasing optimisations, which continue to be possible if the individual pointers are moved around in a way the compiler's able to follow and reason about.
std::swap
may be implemented using some in-place bitwise trickery such that the compiler can't perform that tracking (see here), so is best avoided.
Of course, there's nothing inherently preventing an optimiser from being able to track whatever swap
does and recognising it as the same logical operation as the __tmp
-using code, but whomever made that comment and the choice not to use std::swap
probably observed or deduced that it wasn't optimised as well on at least one version of the compiler they used, for some optimisation level. Worse optimisation could be quite common though.
The same type of observation/reasoning may have led them to avoid a default operator=
. For example, say the implementation on their compiler may have been akin to a memcpy
- if you treat the data as having char
type then you're safe against aliasing bugs (same way you can only safely sequence access to distinct union
members is one is a char
or char[]
), but that's because aliasing optimisations are suppressed.