Search code examples
c++11constructorinitializationvirtual-inheritanceplacement-new

Static allocation and placement new result in null pointer dereference


I am working on an embedded platform where heap allocation is discouraged. I also have circular dependencies during construction. Given these constraints my team designed a static allocator class which is used to allocate memory in the .bss section and then construct the object in a deferred fashion.

The issue we face is during the deferred construction the compiler generated code tries to reference data in the statically allocated memory that hasn't been constructed yet - the data are zero on our platform when unconstructed - which causes a null pointer dereference crashing the system.

The crashes can be resolved by reordering the construction order of the classes. Unfortunately I haven't been able to create a minimum reproduction of the issue. Additionally the problem gets worse and harder to manage when virtual inheritance is involved.

We have experienced the issue targeting armclang and visual studio compilers so it seems like we are likely doing something out of the C++ specification.

Static Allocator Code:

template <class UnderlyingType, typename... Args>
class StaticAllocator
{
private:
    typedef std::uint64_t BaseDataType;

    // Define a tuple of the variadic template parameters with the references removed
    using TupleWithRefsRemoved = std::tuple<typename std::remove_reference<Args>::type...>;

    // A function that strips return the ref-less template arguments
    template <typename... T>
    TupleWithRefsRemoved removeRefsFromTupleMembers(std::tuple<T...> const& t)
    {
        return TupleWithRefsRemoved{ t };
    }

public:
    StaticAllocator()
    {
        const auto ptr = reinterpret_cast<UnderlyingType *>(&m_underlyingData);
        assert(ptr != nullptr);
    }

    virtual StaticAllocator* clone() const
    {
        return new StaticAllocator<UnderlyingType, Args...>(*this);
    }

    UnderlyingType *getPtr()
    {
        return reinterpret_cast<UnderlyingType *>(&m_underlyingData);
    }

    const UnderlyingType *getPtr() const
    {
        return reinterpret_cast<const UnderlyingType *>(&m_underlyingData);
    }

    UnderlyingType *operator->()
    {
        return getPtr();
    }

    const UnderlyingType *operator->() const
    {
        return getPtr();
    }

    UnderlyingType &operator*()
    {
        return *getPtr();
    }

    const UnderlyingType &operator*() const
    {
        return *getPtr();
    }

    operator UnderlyingType *()
    {
        return getPtr();
    }

    operator const UnderlyingType *() const
    {
        return getPtr();
    }

    void construct(Args... args)
    {
        _construct(TupleWithRefsRemoved(args...), std::index_sequence_for<Args...>());
    }

    void destroy()
    {
        const auto ptr = getPtr();
        if (ptr != nullptr)
        {
            ptr->~T();
        }
    }

private:
    BaseDataType m_underlyingData[(sizeof(UnderlyingType) + sizeof(BaseDataType) - 1) / sizeof(BaseDataType)];

    // A function that unpacks the tuple of arguments, and constructs them
    template <std::size_t... T>
    void _construct(const std::tuple<Args...>& args, std::index_sequence<T...>)
    {
        new (m_underlyingData) UnderlyingType(std::get<T>(args)...);
    }
};

Simple Usage Example:

class InterfaceA
{
    // Interface functions here
}

class InterfaceB
{
    // Interface functions here
}


class ObjectA : public virtual InterfaceA
{
public:
    ObjectA(InterfaceB* intrB) : m_intrB(intrB) {}

private:
    InterfaceB* m_intrB;
};

class ObjectB : public virtual InterfaceB
{
public:
    ObjectB(InterfaceA* intrA) : m_intrA(intrA) {}

private:
    InterfaceA* m_intrA;
}

StaticAllocator<ObjectA, InterfaceB*> objectAStorage;
StaticAllocator<ObjectB, InterfaceA*> objectBStorage;

// Crashes happen in this function, there are many more objects in our real
// system and the order of the objects effects if the crash occurs.
void initialize_objects()
{
    auto objA = objectAStorage.getPtr();
    auto objB = objectBStorage.getPtr();

    objectAStorage.construct(objB);
    objectBStorage.construct(objA);
}

Solution

  • This answer describes the problem happening at run time, using the example of GCC. Other compilers will generate different code with similar problem as your code has an inherent the issue of lack of initialization.

    Without the avoidance of dynamic memory allocation for efficiency purpose, without the generic approach, without templates, with every step decomposed, your code really boils down to:

    class InterfaceA {};
    
    class InterfaceB {};
    
    class ObjectA : public virtual InterfaceA {
     public:
      ObjectA(InterfaceB *intrB) : m_intrB(intrB) {}
    
     private:
      InterfaceB *m_intrB;
    };
    
    class ObjectB : public virtual InterfaceB {
     public:
      ObjectB(InterfaceA *intrA) : m_intrA(intrA) {}
    
     private:
      InterfaceA *m_intrA;
    };
    
    #include <new>
    
    void simple_init() {
      void *ObjectA_mem = operator new(sizeof(ObjectA));
      void *ObjectB_mem = operator new(sizeof(ObjectB));
      ObjectA *premature_ObjectA = static_cast<ObjectA *>(ObjectA_mem);  // still not constructed
      ObjectB *premature_ObjectB = static_cast<ObjectB *>(ObjectB_mem);
      InterfaceA *ia = premature_ObjectA;  // derived-to-base conversion
      InterfaceB *ib = premature_ObjectB;
      new (ObjectA_mem) ObjectA(ib);
      new (ObjectB_mem) ObjectB(ia);
    }
    

    For maximum compiled code readability, let's write that with global variables instead:

    void *ObjectA_mem;
    void *ObjectB_mem;
    ObjectA *premature_ObjectA;
    ObjectB *premature_ObjectB;
    InterfaceA *ia;
    InterfaceB *ib;
    
    void simple_init() {
      ObjectA_mem = operator new(sizeof(ObjectA));
      ObjectB_mem = operator new(sizeof(ObjectB));
      premature_ObjectA = static_cast<ObjectA *>(ObjectA_mem);  // still not constructed
      premature_ObjectB = static_cast<ObjectB *>(ObjectB_mem);
      ia = premature_ObjectA;  // derived-to-base conversion
      ib = premature_ObjectB;
      new (ObjectA_mem) ObjectA(ib);
      new (ObjectB_mem) ObjectB(ia);
    }
    

    That gives us a very nice assembly code. We can see that the statement:

      ia = premature_ObjectA;  // derived-to-base conversion
    

    compiles to:

            movq    premature_ObjectA(%rip), %rax
            testq   %rax, %rax
            je      .L6
            movq    premature_ObjectA(%rip), %rdx
            movq    premature_ObjectA(%rip), %rax
            movq    (%rax), %rax
            subq    $24, %rax
            movq    (%rax), %rax
            addq    %rdx, %rax
            jmp     .L7
    .L6:
            movl    $0, %eax
    .L7:
            movq    %rax, ia(%rip)
    

    First we see that the (un-optimized) code tests for a null pointer, the equivalent of

    if (premature_ObjectA == 0) 
      ia = 0;
    else
      // real stuff
    

    The real stuff being:

        movq    premature_ObjectA(%rip), %rdx
        movq    premature_ObjectA(%rip), %rax
        movq    (%rax), %rax
        subq    $24, %rax
        movq    (%rax), %rax
        addq    %rdx, %rax
        movq    %rax, ia(%rip)
    

    So a value pointed to by premature_ObjectA is dereferenced, interpreted as a pointer, decreased by 24, the resulting pointer is used to read a value, that value is added to the original pointer premature_ObjectA. Since the content of premature_ObjectA is uninitialized, that obviously cannot work.

    What's happening is that the compiler is fetching the vptr (vtable pointer) to read the entry at -3 "quad" (3*8 = 24) from the level 0 (a vtable like a building can have negative floors, it just means that the 0th floor isn't the lowest floor):

    vtable for ObjectA:
            .quad   0
            .quad   0
            .quad   typeinfo for ObjectA
    vtable for ObjectB:
            .quad   0
            .quad   0
            .quad   typeinfo for ObjectB
    

    The vtable (of each of these objects) starts at its end, after "typeinfo for ObjectA", as we can see inside compiled code for ObjectA::ObjectA(InterfaceB*):

            movl    $vtable for ObjectA+24, %edx
    ...
            movq    %rdx, (%rax)
    

    So during construction, the vptr is set to "floor 0" of the vtable which is before the first virtual function, at the end if there is no virtual function.

    At floor -3 there is the beginning of the vtable:

    vtable for ObjectA:
            .quad   0
    

    The value 0 is for "InterfaceA is at offset 0 inside a complete ObjectA object".

    The fine details of the vtable layout will be compiler dependent, the principles:

    • initialization of the vptr hidden data member (and possibly multiple other hidden members) in the constructor
    • using these hidden members during conversion to InterfaceA base class

    remains the same.

    My explanation does not provide a fix: we don't even know what kind of high level problem you have and why you use these constructor argument and mutually dependent classes.

    Knowing what these classes represent, we might be able to help more.