How would one implement a variable sized cache object to reduce memory allocations in C++?

Before the performance people tear my head off: yes, I have done profiling before asking this :)

I'm once again looking at my one of a type container, and while I have a solution that works, the performance is poor because each type of item that's cached results in a separate allocation on the heap (which is of course expensive).

Based on static analysis of my program's input, I have figured out a way to know the total size required by all of the objects that might be put in my cache object that's getting passed around. Basically, I have a list of objects that may be constructed in a given cache object, so I know what the size of what I might have to cache is in advance, but not at compile time -- runtime only.

Basically, what I'd like to do is what boost::make_shared does -- gets a single memory block, and constructs the shared_ptr bits as well as the controlled object in the same memory block.

I don't have to worry about preserving copying behavior as the cache object is noncopyable and passed around by pointer by clients (it's usually stored in something like a ptr_vector or a std::auto_ptr).

I'm not familiar however with how exactly one would implement such a container however, namely how one follows alignment restrictions and such.

In pseudocode, what I'd like to do:

//I know a lot of what's in here is not portable -- I need to run only on x86
//and x64 machines. Yes, this couple of classes looks hacky, but I'd rather
//have one hacky class than a whole programfull :)

class CacheRegistrar
{
    //Blah blah
public:
    //Figures out what objects will be in the cache, etc
    const std::vector<std::size_t>& GetRequiredObjectSizes() const;
    //Other stuff...
    template <typename T>
    void RegisterCacheObject();
    template <typename T>
    std::size_t GetObjectIndex() const;
    // etc.
};

class CacheObject;

std::auto_ptr<CacheObject> CacheObjectFactory(const CacheRegistrar& registrar)
{
    //Pretend this is in a CPP file and therefore CacheObject is defined...
    const std::vector<size_t>& sizes(registrar.GetRequiredObjectSizes());
    std::size_t sumOfCache = std::accumulate(sizes.begin(), sizes.end());
    sumOfCache += sizeof(CacheObject);
    boost::scoped_array<char> buffer(new char[] sumOfCache);
    CacheObject *obj = new (reinterpret_cast<void *>(buffer.get())) CacheObject;
    buffer.release(); //PSEUDOCODE (boost::scoped_array has no release member);
    return std::auto_ptr<CacheObject>(obj); //Nothrow
}

class CacheObject
{
    CacheRegistrar *registrar; //Set by my constructor
public:
    template<typename T>
    T& Get()
    {
        char * startOfCache = reinterpret_cast<char *>(this) + 
            sizeof(CacheObject);
        char * cacheItem = startOfCache + registrar->GetObjectIndex<T>();
        return *reinterpret_cast<T*>(cacheItem);
    }
};

Is my general concept sound here? Is there are better way of accomplishing this?

Solution

But first, read this article by Andrei Alexandrescu on what he thinks he should have written in that chapter -- a way to build heaps using Heap Layers (by yours truly). I used Heap Layers to build Hoard, DieHard, and DieHarder, as well as the custom allocators used in our OOPLSA 2002 paper, Reconsidering Custom Memory Allocation, which you should also read before embarking on creating a custom allocator.