I am working on an embedded platform where heap allocation is discouraged. I also have circular dependencies during construction. Given these constraints my team designed a static allocator class which is used to allocate memory in the .bss
section and then construct the object in a deferred fashion.
The issue we face is during the deferred construction the compiler generated code tries to reference data in the statically allocated memory that hasn't been constructed yet - the data are zero on our platform when unconstructed - which causes a null pointer dereference crashing the system.
The crashes can be resolved by reordering the construction order of the classes. Unfortunately I haven't been able to create a minimum reproduction of the issue. Additionally the problem gets worse and harder to manage when virtual inheritance is involved.
We have experienced the issue targeting armclang and visual studio compilers so it seems like we are likely doing something out of the C++ specification.
Static Allocator Code:
template <class UnderlyingType, typename... Args>
class StaticAllocator
{
private:
typedef std::uint64_t BaseDataType;
// Define a tuple of the variadic template parameters with the references removed
using TupleWithRefsRemoved = std::tuple<typename std::remove_reference<Args>::type...>;
// A function that strips return the ref-less template arguments
template <typename... T>
TupleWithRefsRemoved removeRefsFromTupleMembers(std::tuple<T...> const& t)
{
return TupleWithRefsRemoved{ t };
}
public:
StaticAllocator()
{
const auto ptr = reinterpret_cast<UnderlyingType *>(&m_underlyingData);
assert(ptr != nullptr);
}
virtual StaticAllocator* clone() const
{
return new StaticAllocator<UnderlyingType, Args...>(*this);
}
UnderlyingType *getPtr()
{
return reinterpret_cast<UnderlyingType *>(&m_underlyingData);
}
const UnderlyingType *getPtr() const
{
return reinterpret_cast<const UnderlyingType *>(&m_underlyingData);
}
UnderlyingType *operator->()
{
return getPtr();
}
const UnderlyingType *operator->() const
{
return getPtr();
}
UnderlyingType &operator*()
{
return *getPtr();
}
const UnderlyingType &operator*() const
{
return *getPtr();
}
operator UnderlyingType *()
{
return getPtr();
}
operator const UnderlyingType *() const
{
return getPtr();
}
void construct(Args... args)
{
_construct(TupleWithRefsRemoved(args...), std::index_sequence_for<Args...>());
}
void destroy()
{
const auto ptr = getPtr();
if (ptr != nullptr)
{
ptr->~T();
}
}
private:
BaseDataType m_underlyingData[(sizeof(UnderlyingType) + sizeof(BaseDataType) - 1) / sizeof(BaseDataType)];
// A function that unpacks the tuple of arguments, and constructs them
template <std::size_t... T>
void _construct(const std::tuple<Args...>& args, std::index_sequence<T...>)
{
new (m_underlyingData) UnderlyingType(std::get<T>(args)...);
}
};
Simple Usage Example:
class InterfaceA
{
// Interface functions here
}
class InterfaceB
{
// Interface functions here
}
class ObjectA : public virtual InterfaceA
{
public:
ObjectA(InterfaceB* intrB) : m_intrB(intrB) {}
private:
InterfaceB* m_intrB;
};
class ObjectB : public virtual InterfaceB
{
public:
ObjectB(InterfaceA* intrA) : m_intrA(intrA) {}
private:
InterfaceA* m_intrA;
}
StaticAllocator<ObjectA, InterfaceB*> objectAStorage;
StaticAllocator<ObjectB, InterfaceA*> objectBStorage;
// Crashes happen in this function, there are many more objects in our real
// system and the order of the objects effects if the crash occurs.
void initialize_objects()
{
auto objA = objectAStorage.getPtr();
auto objB = objectBStorage.getPtr();
objectAStorage.construct(objB);
objectBStorage.construct(objA);
}
This answer describes the problem happening at run time, using the example of GCC. Other compilers will generate different code with similar problem as your code has an inherent the issue of lack of initialization.
Without the avoidance of dynamic memory allocation for efficiency purpose, without the generic approach, without templates, with every step decomposed, your code really boils down to:
class InterfaceA {};
class InterfaceB {};
class ObjectA : public virtual InterfaceA {
public:
ObjectA(InterfaceB *intrB) : m_intrB(intrB) {}
private:
InterfaceB *m_intrB;
};
class ObjectB : public virtual InterfaceB {
public:
ObjectB(InterfaceA *intrA) : m_intrA(intrA) {}
private:
InterfaceA *m_intrA;
};
#include <new>
void simple_init() {
void *ObjectA_mem = operator new(sizeof(ObjectA));
void *ObjectB_mem = operator new(sizeof(ObjectB));
ObjectA *premature_ObjectA = static_cast<ObjectA *>(ObjectA_mem); // still not constructed
ObjectB *premature_ObjectB = static_cast<ObjectB *>(ObjectB_mem);
InterfaceA *ia = premature_ObjectA; // derived-to-base conversion
InterfaceB *ib = premature_ObjectB;
new (ObjectA_mem) ObjectA(ib);
new (ObjectB_mem) ObjectB(ia);
}
For maximum compiled code readability, let's write that with global variables instead:
void *ObjectA_mem;
void *ObjectB_mem;
ObjectA *premature_ObjectA;
ObjectB *premature_ObjectB;
InterfaceA *ia;
InterfaceB *ib;
void simple_init() {
ObjectA_mem = operator new(sizeof(ObjectA));
ObjectB_mem = operator new(sizeof(ObjectB));
premature_ObjectA = static_cast<ObjectA *>(ObjectA_mem); // still not constructed
premature_ObjectB = static_cast<ObjectB *>(ObjectB_mem);
ia = premature_ObjectA; // derived-to-base conversion
ib = premature_ObjectB;
new (ObjectA_mem) ObjectA(ib);
new (ObjectB_mem) ObjectB(ia);
}
That gives us a very nice assembly code. We can see that the statement:
ia = premature_ObjectA; // derived-to-base conversion
compiles to:
movq premature_ObjectA(%rip), %rax
testq %rax, %rax
je .L6
movq premature_ObjectA(%rip), %rdx
movq premature_ObjectA(%rip), %rax
movq (%rax), %rax
subq $24, %rax
movq (%rax), %rax
addq %rdx, %rax
jmp .L7
.L6:
movl $0, %eax
.L7:
movq %rax, ia(%rip)
First we see that the (un-optimized) code tests for a null pointer, the equivalent of
if (premature_ObjectA == 0)
ia = 0;
else
// real stuff
The real stuff being:
movq premature_ObjectA(%rip), %rdx
movq premature_ObjectA(%rip), %rax
movq (%rax), %rax
subq $24, %rax
movq (%rax), %rax
addq %rdx, %rax
movq %rax, ia(%rip)
So a value pointed to by premature_ObjectA
is dereferenced, interpreted as a pointer, decreased by 24, the resulting pointer is used to read a value, that value is added to the original pointer premature_ObjectA
. Since the content of premature_ObjectA
is uninitialized, that obviously cannot work.
What's happening is that the compiler is fetching the vptr (vtable pointer) to read the entry at -3 "quad" (3*8 = 24) from the level 0 (a vtable like a building can have negative floors, it just means that the 0th floor isn't the lowest floor):
vtable for ObjectA:
.quad 0
.quad 0
.quad typeinfo for ObjectA
vtable for ObjectB:
.quad 0
.quad 0
.quad typeinfo for ObjectB
The vtable (of each of these objects) starts at its end, after "typeinfo for ObjectA", as we can see inside compiled code for ObjectA::ObjectA(InterfaceB*)
:
movl $vtable for ObjectA+24, %edx
...
movq %rdx, (%rax)
So during construction, the vptr is set to "floor 0" of the vtable which is before the first virtual function, at the end if there is no virtual function.
At floor -3 there is the beginning of the vtable:
vtable for ObjectA:
.quad 0
The value 0 is for "InterfaceA
is at offset 0 inside a complete ObjectA
object".
The fine details of the vtable layout will be compiler dependent, the principles:
InterfaceA
base classremains the same.
My explanation does not provide a fix: we don't even know what kind of high level problem you have and why you use these constructor argument and mutually dependent classes.
Knowing what these classes represent, we might be able to help more.