I want to be able to handle pointers to objects with an array member of unknown size, and access that array through a type-erased pointer to their common first member. My current attempt is the following:
#include <cstddef>
struct node_base
{
node_base* next;
int size;
};
template <int n>
struct node
{
node_base base;
char data[n];
node() : base{nullptr, n} {
static_assert(offsetof(node, data) == sizeof(node_base));
}
};
void process_queue(node_base* head)
{
while (head)
{
for (int i = 0; i < head->size; ++i)
{
*(reinterpret_cast<char*>(reinterpret_cast<char*>(head) + sizeof(node_base)) + i) = i;
}
head = head->next;
}
}
int main()
{
node<3> a{};
node<4> b{};
node<2> c{};
c.base.next = &b.base;
a.base.next = &c.base;
process_queue(&a.base);
return a.data[2] + c.data[1];
}
This code builds up a queue-like structure (nodes a
,b
and c
pointing to each other as "a -> c -> b"), and passes a pointer to the first element to process_queue
. That function will then traverse the queue and access the node<n>::data
array stored directly after the node_base
member, and write the values 0
...n-1
into its entries.
The challenge is that the nodes have different types, so the queue's next
pointer point to the node_base
members of the actual nodes, and I need some what to get to the actual data from there.
Although this seems to work (Godbolt) in the sense that it successfully returns expect value of 3, I am not sure whether this is allowed.
Assuming I know by some method that the pointer cur
points to the first member of an object with an array of size cur->size
, is it legal to access the elements of the node<n>::data
array by means of the code above? If not, can it be made legal without making sizeof(node<n>)
larger?
Going strictly by the current standard it is undefined behavior already because the pointer arithmetic here:
reinterpret_cast<char*>(head) + sizeof(node_base)
is undefined. head
is a pointer to a node_base
object, which is not pointer-interconvertible with any char
object at the location. Therefore reinterpret_cast<char*>(head)
will also be a pointer to the same node_base
object. As a consequence pointer arithmetic is undefined because the pointed-to type of the expression (char
) is not similar to the actual type of the pointed-to object (node_base
).
However, your intent with the cast to char*
here is to change the pointer value. You intent to obtain a pointer to the object representation of the node<n>
object. Casts to char*
are commonly used to access object representation, but the standard doesn't actually provide for that.
The proposal P1839 attempts to incorporate this intended behavior into the standard. With its current wording in revision P1839R5 it would still not make your program well-defined, for multiple reasons:
First, because only reinterpret_cast<unsigned char*>
would be possible to obtain a pointer to the object representation, as noted in the limitations section of the proposal.
Even with unsigned char
, there is still issues under the proposal:
Your classes happen to be standard-layout. That's a necessary condition for this to work at all. If they weren't standard-layout, then there generally wouldn't be any way to get from a pointer to one member to a pointer to another member.
But being standard-layout guarantees that the node<n>
object is pointer-interconvertible with its first member subobject. As a consequence, under the proposal it is left open whether reinterpret_cast<unsigned char*>(head)
will produce a pointer to the first element of the object representation of the node_base
member or of the node<n>
object. This is noted as an open issue in the proposal.
Assuming it did however produce a pointer to the object representation of the node<n>
object as you intent, then the next question would be whether reinterpret_cast<unsigned char*>(head) + sizeof(node_base)) + i
would be pointer into the object representation of the char
array member of node<n>
as well. I am not sure know what the proposal intents for this.
But even if that wasn't an issue, the proposal defines only how it is possible to read from the object representation. Writing to it is out-of-scope and still UB under the proposal.
So at the very least you would need to keep the outer reinterpret_cast<char*>
and wrap it in a call to std::launder
in order to obtain a pointer to the char
object itself (rather than its object representation or the object representation of the node<n>
object).