Search code examples
c++templatespolymorphismvtable

Avoiding vtable pointer overhead in array of known type


This is gonna be a long one, but bear with me.

Let's say I'm writing a program that deals with animals

struct animal {
    virtual ~animal() {
    }

    virtual std::string get_noise() const = 0;
};

struct dog : public animal {
    std::string get_noise() const override {
        return "Woof";
    }
};

struct cat : public animal {
    std::string get_noise() const override {
        return "Meow";
    }
};

The animal base class is useful because now we can write a function that works for any type of animal.

void make_noise(const animal& target) {
    std::cout << target.get_noise() << std::endl;
}

We can also make a factory function that returns the right animal when given its type.

std::unique_ptr<animal> make_animal(const std::string& type) {
    if (type == "dog") {
        return std::unique_ptr<animal>(new dog);
    } else {
        return std::unique_ptr<animal>(new cat);
    }
}

Let's instantiate some animals to check that everything works as expected.

dog puppy;
make_noise(puppy);

std::unique_ptr<animal> kitty = make_animal("cat");
make_noise(*kitty);

Great. I love cats. So much, in fact, that I need an array of 10 billion of them.

std::vector<cat> cats(10000000000);
make_noise(cats[0]);

But oh no! Our program throws an exception now. Because of the vtable pointer, the cat class is 8 bytes in size and storing 10 billion kittens would require 80 gigabytes of memory. Sad!

I won't give up yet though, there is a solution! Let's start by removing the animal base class from both of the derived classes.

struct dog {
    std::string get_noise() const {
        return "Woof";
    }
};

struct cat {
    std::string get_noise() const {
        return "Meow";
    }
};

Now the challenge is, how do we implement the make_noise and make_animal functions? We could write a generic make_noise function but the type of its parameters won't always be known at compile time. We need some type to represent a dynamic animal instance but since polymorphism through inheritance turned out to be no good, we'll have to improvise.

Taking inspiration from fat pointers in rust, let's manually implement a vtable!

struct animal_vtable {
    std::string (*get_noise)(const void*);
};

template<class T>
static std::string get_noise(const void* ptr) {
    return static_cast<const T*>(ptr)->get_noise();
}

struct animal {
    const void* value;
    const animal_vtable* vtable;

    template<class T>
    animal(const T& value) {
        static animal_vtable vtable = {
            ::get_noise<T>,
        };

        this->value = &value;
        this->vtable = &vtable;
    }

    std::string get_noise() const {
        return vtable->get_noise(value);
    }
};

Let's ignore for a moment that this now only works for const animals, and also that we'll need another class that is basically identical to animal except using a non-const animal pointer if we want it to work for those as well.

At least now we can implement our make_noise function again.

void make_noise(animal target) {
    std::cout << target.get_noise() << std::endl;
}

That doesn't look too bad, and now it works for 10 billion cats with only 10 gigabytes of memory.

dog puppy;
make_noise(puppy);

std::vector<cat> cats(10000000000);
make_noise(cats[0]);

But we have another problem, implementing make_animal. We can't simply return a std::unique_ptr<animal> because the animal::value pointer wouldn't get properly destroyed. We'd have to add a destructor to our vtable and have yet another class that is basically identical to animal except with a destructor that calls the destructor from our vtable.

At this point the code is becoming unwieldy. I am not satisfied with how poorly the custom "fat" pointer works with standard library types such as std::unique_ptr. And with the manual vtables it almost feels like we're programming in C again.

So finally, for my question, is there a better way? I don't mind writing some boilerplate template metaprogramming magic if that helps keep the rest of my code clean.

C++ is said to be a language where you "don't pay for what you don't use", and yet I'm having to pay for storing the vtable 10 billion times in my cats array even though I know at compile time that the vtable will always be that of a cat, or risk turning my code into an unreadable mess.

Thank you for making it to the end.

I'm going to bed now.


Solution

  • Because of the vtable pointer, the cat class is 8 bytes in size and storing 10 billion kittens would require 80 gigabytes of memory. ... So finally, for my question, is there a better way? I don't mind writing some boilerplate template metaprogramming magic if that helps keep the rest of my code clean.

    If storing object type inside each object is not practical, another way is to store the object type in the pointer.

    In 64-bit address space, you can allocate your objects of different types in such a way, that a plain pointer encodes the object type naturally.

    Say, you'd like to handle up to 16Mi of 256-byte cats and up to 8Mi of 512-byte dogs - 4GiB pool per object type. You reserve one 8GiB region of virtual address space with mmap, and allocate cats from the first half of the region, dogs - from the second. Due to demand paging, page frames (physical RAM) are allocated only for pages of the region which were stored into.

    A pointer to an object is a sum of region's base address and an offset. With 4GiB (2³²) object pools, the high 32 bits of the 64-bit offset naturally encode the object type, 0 for cats and 1 for dogs.

    The non-polymorphic base class has to implement run-time polymorphism of each otherwise virtual function.

    Here is a working sketch:

    #include <memory>
    #include <new>
    #include <iostream>
    #include <cstdint>
    #include <utility>
    
    #include <boost/interprocess/anonymous_shared_memory.hpp>
    #include <boost/interprocess/mapped_region.hpp>
    
    // Compile-time map of class types to integer.
    template<class T> using Type = std::common_type<T>;
    template<class T> void type_to_pool_id(Type<T>); // Use ADL to find type_to_pool_id declarations in namespaces of T.
    
    template<uintptr_t N_POOLS, uintptr_t POOL_SIZE>
    struct PoolAllocator {
        struct Pool {
            struct Free { Free* prev; };
            Free* free = 0; // A stack of deallocated elements.
            size_t allocated = 0;
    
            Pool() noexcept = default;
            Pool(Pool const&) = delete;
            Pool& operator=(Pool const&) = delete;
    
            void deallocate(void* p) noexcept {
                free = new (p) Free{free}; // Stack push.
            }
    
            void* allocate(size_t size, uintptr_t pool) {
                if(free)
                    return std::exchange(free, free->prev); // Stack pop.
                auto allocated2 = allocated + std::max(size, sizeof(Free));
                if(allocated2 > POOL_SIZE)
                    throw std::bad_alloc();
                return reinterpret_cast<void*>(pool + std::exchange(allocated, allocated2));
            }
        };
    
        boost::interprocess::mapped_region reserved_region_{boost::interprocess::anonymous_shared_memory(N_POOLS * POOL_SIZE)};
        Pool pools_[N_POOLS];
    
        uintptr_t get_pool_id(void* p) const noexcept {
            auto offset = reinterpret_cast<uintptr_t>(p) - reinterpret_cast<uintptr_t>(reserved_region_.get_address());
            return offset / POOL_SIZE;
        }
    
        void deallocate(uintptr_t pool_id, void* p) noexcept {
            pools_[pool_id].deallocate(p);
        }
    
        void* allocate(uintptr_t pool_id, size_t size) {
            return pools_[pool_id].allocate(size, reinterpret_cast<uintptr_t>(reserved_region_.get_address()) + POOL_SIZE * pool_id);
        }
    
        template<class T>
        void destroy(T& t) noexcept(noexcept(t.~T())) {
            void* p = &t;
            t.~T();
            deallocate(get_pool_id(p), p);
        }
    
        template<class T, class... Args>
        T* create(Args&&... args) {
            constexpr auto pool_id = decltype(type_to_pool_id(Type<T>{}))::value; // Use ADL to find type_to_pool_id declarations in namespaces of T.
            void* p = allocate(pool_id, sizeof(T));
            try {
                return new (p) T{std::forward<Args>(args)...};
            }
            catch(...) {
                deallocate(pool_id, p); // Don't leak memory should T constructor throw.
                throw;
            }
        }
    };
    
    // Non-polymorphic base class.
    struct Animal {
        void destroy(); // Implements polymorphic behaviour.
    
        std::string get_noise(); // Implements polymorphic behaviour.
    
        template<class R, class F>
        R visit(F&& f); // Downcast to the derived class.
    
        using Pool = PoolAllocator<2, uintptr_t{1} << 32>;
        static Pool* pool; // All derived objects are allocated from this pool.
    
    protected:
        ~Animal() noexcept = default; // Non-polymorphic, destroy must be used instead.
    };
    Animal::Pool* Animal::pool = 0;
    
    void Animal::destroy() { // Implements polymorphic behaviour.
        visit<void>([](auto& derived) { pool->destroy(derived); });
    }
    
    std::string Animal::get_noise() { // Implements polymorphic behaviour.
        return visit<std::string>([](auto& derived) { return derived.get_noise(); });
    }
    
    // unique_ptr support.
    struct AnimalDeleter {
        void operator()(Animal* a) const {
            if(a)
                a->destroy();
        }
    };
    template<class T>
    using AnimalPtr = std::unique_ptr<T, AnimalDeleter>;
    
    // Derived classes.
    struct Cat : Animal {
        std::string get_noise() { return "Meow"; }
        ~Cat() { std::cout << __PRETTY_FUNCTION__ << '\n'; }
    };
    struct Dog : Animal {
        std::string get_noise() { return "Woof"; }
        ~Dog() { std::cout << __PRETTY_FUNCTION__ << '\n'; }
    };
    
    // Compile-time map of class types to integer.
    std::integral_constant<int, 0> type_to_pool_id(Type<Cat>);
    std::integral_constant<int, 1> type_to_pool_id(Type<Dog>);
    
    // Downcast to the derived class. Requires all derived class definitions to be available at this point.
    template<class R, class F>
    R Animal::visit(F&& f) {
        switch(pool->get_pool_id(this)) {
        case decltype(type_to_pool_id(Type<Cat>{}))::value: return f(static_cast<Cat&>(*this));
        case decltype(type_to_pool_id(Type<Dog>{}))::value: return f(static_cast<Dog&>(*this));
        default: throw;
        }
    }
    
    int main() {
        // Setup the pool in main.
        Animal::Pool pool;
        Animal::pool = &pool;
    
        AnimalPtr<Cat> cat(pool.create<Cat>());
        std::cout << cat->get_noise() << '\n';
        AnimalPtr<Animal> base_cat = move(cat);
        std::cout << base_cat->get_noise() << '\n';
    
        AnimalPtr<Dog> dog(pool.create<Dog>());
        std::cout << dog->get_noise() << '\n';
        AnimalPtr<Animal> base_dog = move(dog);
        std::cout << base_dog->get_noise() << '\n';
    }
    

    Outputs:

    Meow
    Meow
    Woof
    Woof
    Dog::~Dog()
    Cat::~Cat()
    

    Better Code: Runtime Polymorphism - Sean Parent explains the ideas in greater detail.