I'm trying to implement an array-like container with some special requirements and a subset of std::vector
interface. Here is a code excerpt:
template<typename Type>
class MyArray
{
public:
explicit MyArray(const uint32_t size) : storage(new char[size * sizeof(Type)]), maxElements(size) {}
MyArray(const MyArray&) = delete;
MyArray& operator=(const MyArray&) = delete;
MyArray(MyArray&& op) { /* some code */ }
MyArray& operator=(MyArray&& op) { /* some code */ }
~MyArray() { if (storage != nullptr) delete[] storage; /* No explicit destructors. Let it go. */ }
Type* data() { return reinterpret_cast<Type*>(storage); }
const Type* data() const { return reinterpret_cast<const Type*>(storage); }
template<typename... Args>
void emplace_back(Args&&... args)
{
assert(current < maxElements);
new (storage + current * sizeof(Type)) Type(std::forward<Args>(args)...);
++current;
}
private:
char* storage = nullptr;
uint32_t maxElements = 0;
uint32_t current = 0;
};
It works perfectly well on my system, but dereferencing a pointer returned by data
seems to violate strict aliasing rules. It's also a case for naive implementation of subscript operator, iterators, etc.
So what is a proper way to implement containers backed by arrays of char without breaking strict aliasing rules? As far as I understand, using std::aligned_storage
will only provide a proper alignment, but will not save the code from being broken by compiler optimizations which rely on strict aliasing. Also, I don't want to use -fno-strict-aliasing
and similar flags due to performance considerations.
For example, consider subscript operator (nonconstant for brevity), which is a classical code snippet from articles about UB in C++:
Type& operator[](const uint32_t idx)
{
Type* ptr = reinterpret_cast<Type*>(storage + idx * sizeof(ptr)); // Cast is OK.
return *ptr; // Dereference is UB.
}
What is a proper way to implement it without any risk to find my program broken? How is it implemented is standard containers? Is there any cheating with non-documented compiler intrinsics in all compilers?
Sometimes I see code with two static casts through void*
instead of one reinterpret cast:
Type* ptr = static_cast<Type*>(static_cast<void*>(storage + idx * sizeof(ptr)));
How is it better than reinterpret cast? As to me, it does not solve any problems, but looks overcomplicated.
but dereferencing a pointer returned by data seems to violate strict aliasing rules
I disagree.
Both
char* storage
and a pointer returned bydata()
point to the same region of memory.
This is irrelevant. Multiple pointers pointing to same object doesn't violate aliasing rules.
Moreover, subscript operator will ... dereference a pointer of incompatible type, which is UB.
But the object isn't of incompatible type. In emplace_back
, you use placement new to construct objects of Type
into the memory. Assuming no code path can avoid this placement new and therefore assuming that the subscript operator returns a pointer which points at one of these objects, then dereferencing the pointer of Type*
is well defined, because it points to an object of Type
, which is compatible.
This is what is relevant for pointer aliasing: The type of the object in memory, and the type of the pointer that is dereferenced. Any intermediate pointer that the dereferenced pointer was converted from is irrelevant to aliasing.
Note that your destructor does not call the detructor of objects constructed within storage
, so if Type
isn't trivially destructable, then the behaviour is undefined.
Type* ptr = reinterpret_cast<Type*>(storage + idx * sizeof(ptr));
The sizeof
is wrong. What you need is sizeof(Type)
, or sizeof *ptr
. Or more simply
auto ptr = reinterpret_cast<Type*>(storage) + idx;
Sometimes I see code with two static casts through
void*
instead of one reinterpret cast: How is it better than reinterpret cast?
I can't think of any situation where the behaviour would be different.